What are Zero Downtime Deployments?

A primer on zero downtime deploy, atomic deployments, red-green deployments, and how to minimize downtime during database migrations.

Image

Zero down­time deploy­ment is a deploy­ment method where your web­site or appli­ca­tion is nev­er down or in an unsta­ble state dur­ing the deploy­ment process. To achieve this the web serv­er doesn’t start serv­ing the changed code until the entire deploy­ment process is complete.

In this arti­cle I’m going to assume you’re just using a sin­gle web serv­er and not in a sit­u­a­tion where you need a load bal­anced con­fig­u­ra­tion. Zero down­time deploy­ments are even bet­ter in those sit­u­a­tions but the vast major­i­ty of web­sites run off a sin­gle web serv­er instance.

Zero Down­time with Atom­ic Deployments

Zero down­time is a type of deploy­ment. There are lim­it­less ways to make it hap­pen. Off-the-shelf tools, home­spun scripts, or com­pli­cat­ed pipelines via some­thing like Git­lab.

One of the most pop­u­lar zero down­time deploy­ment meth­ods is the atom­ic deploy­ment.

What are Atom­ic Deployments?

Atom­ic deploy­ments are a style of code deploy­ment that sym­link the most recent ver­sion of the code so it’s avail­able to the web serv­er to serve.

The direc­to­ry struc­ture of an atom­i­cal­ly deployed site looks like some­thing like this:

current
deploy-cache/
releases/
	20190504161640/
	20190504164421/
	20190504170431/
	20190504172417/

The current direc­to­ry is actu­al­ly a sym­link to the most recent release direc­to­ry inside of releases. In this set­up, we would point our web serv­er to current/web and then it would always be point­ed to the lat­est ver­sion of the code.

Inside of the time­stamped direc­to­ries is a com­plete ver­sion of the web­site or web appli­ca­tion code.

The set­up has two impor­tant implications. 

First, none of the new code is avail­able to be exe­cut­ed until all of the code is com­plete­ly deployed. Th entire code base is first saved in the time­stamped direc­to­ry inside of releases (and with some tools first saved to a deploy-cache directory).

Sec­ond, if there is a prob­lem with the deployed code it’s very fast and sim­ple to roll back” the deploy­ment by re-link­ing the current sym­link to the pre­vi­ous release (by date in the releases direc­to­ry). It doesn’t require a re-deploy or any­thing time con­sum­ing. The code is already there, waiting.

Why are atom­ic deploy­ments help­ful in Craft CMS?

Craft CMS relies com­plete­ly on Com­pos­er and the vendor direc­to­ry that Com­pos­er builds to hold the appli­ca­tion code and all of its depen­den­cies (includ­ing all plu­g­ins or mod­ules you have installed). 

If you’re doing a soft­ware update to Craft or plu­g­ins, it can take sev­er­al sec­onds to down­load the updates into the vendor direc­to­ry. Dur­ing this time the site code is unsta­ble and like­ly incom­plete. If vis­i­tors were allowed to vis­it the site dur­ing the update process it would like­ly cause errors.

There­fore, zero down­time deploy­ments via atom­ic deploy­ments allow the Com­pos­er to down­load and save the updat­ed depen­den­cies com­plete­ly before the new ver­sion of the soft­ware is served by the web server.

How do we han­dle data­base migra­tions and zero down­time deploy­ments in Craft?

Some­times your deploy­ments will just be code updates, like revised tem­plates, but fre­quent­ly there will be a data­base migra­tion or some sort of data­base update that hap­pens along with the code update. Even the Craft project con­fig file will trig­ger a data­base update when Craft sees a change in the file.

To work around this, we want to run the project con­fig sync or update migra­tion imme­di­ate­ly after all of the code is deployed and right after the new release is linked to the current directory.

To do this we’d script our deploy­ment process (deploy­ing Craft with Envoy­ermakes this very easy) to run craft project-config/sync and craft migrate/all to run all Craft and plu­g­in migrations.

(You could also vis­it the con­trol pan­el URL for the web­site to kick off the migration.)

But you can prob­a­bly see the gap in our zero down­time deploy­ment. There’s going to be a very small win­dow of time between the code deploy­ment being live and when the data­base migra­tion is com­plete. Craft CMS upgrade migra­tions are usu­al­ly pret­ty quick but there is still a chance some­one could access your site while the migra­tion is in-progress.

How can we work around this?

After-Hours Deploy­ments

One work-around is an old fash­ioned after-hours, sched­uled deploy­ment. Just the oth­er day I had to do a Craft and plu­g­in update on a site. I test­ed it local­ly, on a dev serv­er, on the stag­ing serv­er, and every­thing went swim­ming­ly. How­ev­er, I still sched­uled a deploy­ment to pro­duc­tion dur­ing off hours (around 8pm at night) in order to min­i­mize the impact on visitors.

Blue-Green Deploy­ments

There’s anoth­er style of deploy­ment called Blue-Green. It is intend­ed to remove any down­time, even dur­ing updates that require data­base migration. 

In this deploy­ment set­up there are two pro­duc­tion envi­ron­ments. If you have two load bal­anced web servers and a data­base serv­er then you need that times two.

How does it work? Mar­tin Fowler puts is suc­cinct­ly:

As you pre­pare a new release of your soft­ware you do your final stage of test­ing in the green envi­ron­ment. Once the soft­ware is work­ing in the green envi­ron­ment, you switch the router so that all incom­ing requests go to the green envi­ron­ment — the blue one is now idle.

Blue-green deploy­ment also gives you a rapid way to roll­back — if any­thing goes wrong you switch the router back to your blue environment. 

But what about data­base changes from user data (like here on CraftQuest where new rows of data are being cre­at­ed every few sec­onds) that hap­pened in blue while green was being pre­pared to go live?

There’s still the issue of deal­ing with missed trans­ac­tions while the green envi­ron­ment was live, but depend­ing on your design you may be able to feed trans­ac­tions to both envi­ron­ments in such a way as to keep the blue envi­ron­ment as a back­up when the green is live. Or you may be able to put the appli­ca­tion in read-only mode before cut-over, run it for a while in read-only mode, and then switch it to read-write mode. That may be enough to flush out many out­stand­ing issues.

There will always be an issue to work around but blue-green might be a sol­id solu­tion if you need tru­ly zero down­time for you website.

Deploy­ment Tools Sup­port­ing Zero Downtime/​Atomic Deployments

My first brush with atom­ic deploy­ment was years ago using Capis­tra­no to deploy Ruby on Rails apps (and then lat­er PHP apps). But now a bevy of tools over sup­port for atom­ic deployments. 

Here are a few (but a quick web search would like­ly turn up more):

  • Envoy­er — Built for the Lar­avel com­mu­ni­ty, this is the per­fect zero down­time deploy­ment tool for Craft CMS.
  • Bud­dy — A robust deployment/​pipeline tool that can do the sim­plest deploy­ments or those requir­ing max­i­mum cus­tomiza­tion. Bud­dy also sup­ports atom­ic deploy­ments.
  • Deploy­HQ — One of the old­er host­ed deploy­ment tools but they also sup­port atom­ic deployments.

Zero down­time deploy­ments shouldn’t be a deploy­ment type; it should be the deploy­ment type. Unless we’re doing a major over­haul that requires an extend­ed down­time peri­od, all of our deploy­ments should be of the zero-down­time variety.

But we might think we’re doing zero down­time deploy­ments when we real­ly aren’t. Let’s think back to the old days when we would just upload a bunch of updat­ed files via SFTP. Sure, it might only take a minute or two but there was that in-between time when myImportantFunctions.php was updat­ed while the oth­er file (prob­a­bly logInUser.php) wasn’t yet updat­ed but still called func­tions in myImportantFunctions.php.

You can see the prob­lem here. 

But it’s not just some­thing as archa­ic as an SFTP upload. This can also be the case with those clever deploy­ment setups that just use Git. 

git-pull the master branch onto the serv­er as a way to deploy changed code. While Git is very fast it still isn’t imme­di­ate­ly updat­ing all files at once and there could eas­i­ly be code issues if some­one is using your web appli­ca­tion or web­site at the same time.

That’s where the con­cept of zero down­time deploy­ments comes in. The web­site is always avail­able and all code changes are made avail­able at the same time.