kat markava

View Original

Deploying weekly and... remotely

First, some facts:

1. My team is used to deploying to production every week. 

2. My team is not used to being remote. 

3. My team now deploys to PROD remotely.

------------

We have always been paying a lot of attention to the process of the release cycles, constantly chipping away at the steps, slimming and de-risking it. So, we believe we have a pretty reliable and efficient flow by now.

Yet, when the first COVID-19 lockdown's release day arrived, all of us involved -- engineering, QA and product-- were sweating a lot. I mean, without having everyone within arm's reach at the office, how do you make sure that: 

a. no one misses any migration steps, if they are needed?

b. all environmental variables are in place?

c. you did not reach some mysterious limits on Jenkins?

(Right, we are using Jenkins for deploys.)

------------

These are just some of the things that can go wrong. There can be many more, of course, because our product is very complex with a ton of interdependencies and jobs that have to run in production.

That's why I wanted to share with y'all and also selfishly reflect on how we managed to not screw it all up.:sweat_smile:

What seemed to have worked:

1. A clear checklist that the lead engineer and I agreed upon and documented;

2. A separate MS Teams channel dedicated to the release communication;

3. Locking in the release window -- just as with space launches -- if we do not make it by the agreed time, we abort;

4. And, my personal favorite, Blue-Green Deployment that uses Blue Ocean interface and makes the new PROD build a simple click of a button (well, a few series of buttons).

To summarize, the combination of _solid team communication, operational preparation, and the work DevOps had put into visualizing the deploy pipeline_, made it possible for us to continuously deliver even after this sudden shift in work style. 

What still needs improving:

1. Communication — yeah, there can never be enough of it. Now and then, there are threads that get lost in translation or simply overlooked when someone's less than strategically placed comment pushes out critical content;

2. Testing Process — there is always more testing tat can be done, but how to do it more efficiently and reliably in the remote conditions is a different challenge.

------------

Ok, so to be fair this stuff is not easy.

I see it as a constant work in progress.

------------

What've your experiences been like?

What advice do you have for remote CICD during a pandemic?