Andrew Welch · Insights · #backups #devops #craftcms

Published , updated · 5 min read ·


Please consider 🎗 sponsoring me 🎗 to keep writing articles like this.

Mitigating Disaster via Website Backups

A sol­id back­up strat­e­gy is an insur­ance pol­i­cy for your clients, and can make you the hero if dis­as­ter strikes

Tidal Wave Disaster

Build­ing a web­site takes a ton of work; nobody knows this bet­ter than the web devel­op­ers and design­ers who build the web­site. Clients at least under­stand the amount of work involved when they pay their invoice, and yet more often than not their invest­ment isn’t pro­tect­ed with a prop­er back­up strategy.

This would be like building a house, but not bothering to get home owners insurance.

While many host­ing facil­i­ties and VPS ser­vices offer snap­shot” back­ups, what they do is essen­tial­ly make a back­up disk image of your server.

Linode Snapshot Backups

Lin­ode snap­shot” backups

While this is use­ful if you want to restore the entire serv­er to an arbi­trary point in time, it’s pret­ty heavy-hand­ed if all you need to do is restore a file that the client delet­ed (and you’ll lose any­thing else they’ve changed in the interim).

I view back­ups like this as good to have in an emer­gency, and use­ful if you want to do a back­up before doing a major serv­er upgrade, in case some­thing goes awry. But they are not what I’m look­ing for most of the time; indeed, snap­shot back­ups are not rec­om­mend­ed as a way to reli­ably back up dynam­i­cal­ly chang­ing data such as a mysql database.

So let’s see if we can’t come up with more practical backup strategies

Imple­ment­ing a prop­er back­up strat­e­gy is a ser­vice that you can pro­vide to your clients, because it has real val­ue. So let’s have a look at how we can do it.

The first thing we need to do is logis­ti­cal­ly sep­a­rate the con­tent that we cre­ate from the con­tent that the client cre­ates. Con­tent that we cre­ate, such as the design, HTML, CSS, JavaScript, tem­plates, and graph­ics should all be checked into a git repository.

This allows you to col­lab­o­rate with oth­er devel­op­ers, gives you an intrin­si­cal­ly ver­sioned back­up of the web­site struc­ture, and allows you to eas­i­ly deploy the web­site. Whether you use GitHub​.com pri­vate repos, Beanstalk​a​pp​.com, Bit​Buck​et​.org, Git​Lab​.com, or your own pri­vate Git serv­er for your git repos, it does­n’t real­ly mat­ter. Just start using them.

This all goes into one box, and we store that box in a git repo, so it’s already backed up. If you’re not doing it already, the time is now to get on board with using git repos. It’s get­ting to the point where it’s a stan­dard part of web development.

Backup Boxes

Then the con­tent that the client cre­ates, in terms of the data they enter in the back­end, images they upload, and so on goes into anoth­er box. The Data­base & Asset Sync­ing Between Envi­ron­ments in Craft CMS talks about this sep­a­ra­tion, and we’re going to lever­age it here as well, again with the help of Craft-Scripts.

This box of client uploaded con­tent is the part that we have to devel­op a back­up strat­e­gy for.

Link Enter Craft-Scripts

Before we get into the nit­ty grit­ty of back­ups, let’s talk a lit­tle bit about the tools we’re going to use to make it happen.

Craft-Scripts are shell scripts to man­age data­base back­ups, asset back­ups, file per­mis­sions, asset sync­ing, cache clear­ing, and data­base sync­ing between Craft CMS envi­ron­ments. In real­i­ty, they will real­ly work with just about any CMS out there, but we’ll focus on their use with Craft CMS here.

You may already be famil­iar with Craft-Scripts, if you use them for Hard­en­ing Craft CMS Per­mis­sions or Data­base & Asset Sync­ing Between Envi­ron­ments in Craft CMS. They also have handy scripts for doing backups.

In a nut­shell, the way Craft-Scripts works is you copy the scripts fold­er into each Craft CMS pro­jec­t’s git repo, and then set up a .env.sh (which is nev­er checked into git via .gitignore) on each envi­ron­ment where the project lives, such as live pro­duc­tion, staging, and local dev. For more on mul­ti­ple envi­ron­ments, check out the Mul­ti-Envi­ron­ment Con­fig for Craft CMS arti­cle.

Then you can use the same scripts in each envi­ron­ment, and they will know things like how to access the data­base, where the assets are, etc. based on the set­tings in the local .env.sh

The Craft-Scripts doc­u­men­ta­tion cov­ers set­ting up the .env.sh in detail, so we won’t go into that here, how­ev­er I think real-world exam­ples can be help­ful. So here’s the full .env.sh that I use on my local dev envi­ron­ment for this very website:

# Craft Scripts Environment
#
# Local environmental config for nystudio107 Craft scripts
#
# @author    nystudio107
# @copyright Copyright (c) 2017 nystudio107
# @link      https://nystudio107.com/
# @package   craft-scripts
# @since     1.1.0
# @license   MIT
#
# This file should be renamed to '.env.sh' and it should reside in the
# `scripts` directory.  Add '.env.sh' to your .gitignore.

# -- GLOBAL settings --

# What to prefix the database table names with
GLOBAL_DB_TABLE_PREFIX="craft_"

# The path of the `craft` folder, relative to the root path; paths should always have a trailing /
GLOBAL_CRAFT_PATH="craft/"

# The maximum age of backups in days; backups older than this will be automatically removed
GLOBAL_DB_BACKUPS_MAX_AGE=90

# -- LOCAL settings --

# Local path constants; paths should always have a trailing /
LOCAL_ROOT_PATH="/home/vagrant/sites/nystudio107/"
LOCAL_ASSETS_PATH=${LOCAL_ROOT_PATH}"public/img/"

# Local user & group that should own the Craft CMS install
LOCAL_CHOWN_USER="vagrant"
LOCAL_CHOWN_GROUP="vagrant"

# Local directories relative to LOCAL_ROOT_PATH that should be writeable by the $CHOWN_GROUP
LOCAL_WRITEABLE_DIRS=(
                "${GLOBAL_CRAFT_PATH}storage"
                "public/img"
                )

# Local asset directories relative to LOCAL_ASSETS_PATH that should be synched with remote assets
LOCAL_ASSETS_DIRS=(
                "blog"
                "clients"
                "users"
                )

# Craft-specific file directories relative to LOCAL_CRAFT_FILES_PATH that should be synched with remote files
LOCAL_CRAFT_FILE_DIRS=(
                "rebrand"
                "userphotos"
                )

# Absolute paths to directories to back up, in addition to `LOCAL_ASSETS_DIRS` and `LOCAL_CRAFT_FILE_DIRS`
LOCAL_DIRS_TO_BACKUP=(
                "/home/forge/wiki.nystudio107.com"
                )

# Local database constants
LOCAL_DB_NAME="nystudio"
LOCAL_DB_PASSWORD="secret"
LOCAL_DB_USER="homestead"
LOCAL_DB_HOST="localhost"
LOCAL_DB_PORT="3306"

# If you are using mysql 5.6.10 or later and you have `login-path` setup as per:
# https://opensourcedbms.com/dbms/passwordless-authentication-using-mysql_config_editor-with-mysql-5-6/
# you can use it instead of the above LOCAL_DB_* constants; otherwise leave this blank
LOCAL_DB_LOGIN_PATH="localdev"

# The `mysql` and `mysqldump` commands to run locally
LOCAL_MYSQL_CMD="mysql"
LOCAL_MYSQLDUMP_CMD="mysqldump"

# Local backups path; paths should always have a trailing /
LOCAL_BACKUPS_PATH="/home/vagrant/backups/"

# -- REMOTE settings --

# Remote ssh credentials, user@domain.com and Remote SSH Port
REMOTE_SSH_LOGIN="forge@nystudio107.com"
REMOTE_SSH_PORT="22"

# Remote path constants; paths should always have a trailing /
REMOTE_ROOT_PATH="/home/forge/nystudio107.com/"
REMOTE_ASSETS_PATH=${REMOTE_ROOT_PATH}"public/img/"

# Remote database constants
REMOTE_DB_NAME="nystudio"
REMOTE_DB_PASSWORD="XXX"
REMOTE_DB_USER="nystudio"
REMOTE_DB_HOST="localhost"
REMOTE_DB_PORT="3306"

# If you are using mysql 5.6.10 or later and you have `login-path` setup as per:
# https://opensourcedbms.com/dbms/passwordless-authentication-using-mysql_config_editor-with-mysql-5-6/
# you can use it instead of the above REMOTE_DB_* constants; otherwise leave this blank
REMOTE_DB_LOGIN_PATH=""

# The `mysql` and `mysqldump` commands to run remotely
REMOTE_MYSQL_CMD="mysql"
REMOTE_MYSQLDUMP_CMD="mysqldump"

# Remote backups path; paths should always have a trailing /
REMOTE_BACKUPS_PATH="/home/forge/backups/"

# Remote Amazon S3 bucket name
REMOTE_S3_BUCKET="backups.nystudio107"

The only thing I’ve changed is I’ve XXXd out my REMOTE_DB_PASSWORD, every­thing else is exact­ly how I use it. Don’t wor­ry about under­stand­ing what all of the set­tings are now, I’m pre­sent­ing it here just to give you a feel for what it looks like ful­ly configured.

Now that the intro to Craft-Scripts is out of the way, let’s deal with some disasters!

Link Backups for Disasters Big and Small

When we talk about dis­as­ter recov­ery, we have to real­ize that dis­as­ters come in dif­fer­ent shapes and sizes, and pre­pare for like­ly sce­nar­ios. By far the most com­mon dis­as­ter” is that the client has some­how lost data due to delet­ing the wrong entry, or delet­ing an asset by mistake.

Disaster Recovery

In cas­es like this, what we real­ly want are local back­ups that are easy to access on the serv­er, and thus easy to restore. We want to ensure that the con­tent that the client cre­ates in the form of data­base entries and uploaded assets are tucked away safe­ly, await­ing the inevitable human error.

Link Local Database Backups

So our first step is mak­ing sure that we keep dai­ly back­ups of the data­base, for the times when client error caus­es data loss. For this, we’ll use the backup_db.sh script.

When this script is exe­cut­ed, it will make a local copy of the data­base, exclud­ing cache tables we don’t want, neat­ly com­pressed and time-stamped, and save it in the direc­to­ry your spec­i­fy in LOCAL_BACKUPS_PATH.

It will also rotate the back­ups, in that it will delete any back­ups that are old­er than GLOBAL_DB_BACKUPS_MAX_AGE days old. This way, you’ll nev­er have to wor­ry about run­ning out of disk space due to back­ups gone wild.

I’ve found that in gen­er­al, prob­lems are usu­al­ly noticed with­in 30 days or so of them hap­pen­ing, but I’m para­noid, so I keep these local data­base back­ups around for 90 days. What you should set it to depends on your use-case, and how often you do the backups.

Here’s an exam­ple out­put after run­ning backup_db.sh:

vagrant@homestead ~/sites/nystudio107/scripts (develop) $ ./backup_db.sh
*** Backed up local database to /home/vagrant/backups/nystudio/db/nystudio-db-backup-20170320-022335.sql.gz
*** 2 old database backups removed; details logged to /tmp/nystudio-db-backups.log

The num­bers at the end of the back­up archive are a time­stamp in the for­mat of YYYYMMDD-HHMMSS.

Link Local Asset Backups

So great, we have the clien­t’s data­base local­ly backed up. Next we need to back up their assets, the files that they upload into the CMS. To do this, we’ll use the backup_assets.sh script.

This script uses rsync to effi­cient­ly back up all of the asset direc­to­ries spec­i­fied in LOCAL_ASSETS_DIRS to the direc­to­ry spec­i­fied in LOCAL_BACKUPS_PATH. A sub-direc­to­ry LOCAL_DB_NAME/assets inside the LOCAL_BACKUPS_PATH direc­to­ry is used for the asset backups.

backup_assets.sh will also back up the Craft userphotos and rebrand direc­to­ries from craft/storage by default. The direc­to­ries it will back­up are spec­i­fied in LOCAL_CRAFT_FILE_DIRS

Because rsync is used, the files are effec­tive­ly mir­rored into a sep­a­rate local direc­to­ry, so only files that have actu­al­ly changed are backed up. This makes the back­ups very quick, and because the files are stored uncom­pressed, you have quick and easy access to restore that won­der­ful image of a fluffy poo­dle that the client deleted.

If a file is delet­ed from a LOCAL_ASSETS_DIR, it does­n’t get delet­ed from the LOCAL_BACKUPS_PATH, so you can eas­i­ly find the file to res­cue it.

Here’s exam­ple out­put from backup_assets.sh:

vagrant@homestead ~/sites/nystudio107/scripts (develop) $ ./backup_assets.sh
sending incremental file list
blog/
blog/backup-boxes.png
         21,175 100%    0.00kB/s    0:00:00 (xfr#1, to-chk=144/152)
blog/_desktop/
blog/_desktop/backups-are-not-sexy.jpg
        294,064 100%   25.49MB/s    0:00:00 (xfr#2, to-chk=29/152)
blog/_desktop/tidal-wave-disaster.jpg
        320,383 100%   12.73MB/s    0:00:00 (xfr#3, to-chk=6/152)
*** Backed up assets from /home/vagrant/sites/nystudio107/public/img/blog
sending incremental file list
*** Backed up assets from /home/vagrant/sites/nystudio107/public/img/clients
sending incremental file list
*** Backed up assets from /home/vagrant/sites/nystudio107/public/img/users
sending incremental file list
*** Backed up assets from /home/vagrant/sites/nystudio107/craft/storage/rebrand
sending incremental file list
*** Backed up assets from /home/vagrant/sites/nystudio107/craft/storage/userphotos

Because rsync is used for these back­ups, you can put a .rsync-filter in any direc­to­ry to define files/​folders to ignore. More info…

For exam­ple, if you don’t want any Craft image trans­forms backed up, your .rsync-filter file in each assets direc­to­ry might look like this:

# This file allows you to add filter rules to rsync, one per line, preceded by either
# `-` or `exclude` and then a pattern to exclude, or `+` or `include` and then a pattern
# to include. More info: http://askubuntu.com/questions/291322/how-to-exclude-files-in-rsync
- _*
- _*/**

If you have arbi­trary direc­to­ries that you want backed up that exist out­side of your project direc­to­ry, you can use the backup_dirs.sh script.

This script uses rsync to effi­cient­ly back up all of the asset direc­to­ries spec­i­fied in LOCAL_DIRS_TO_BACKUP to the direc­to­ry spec­i­fied in LOCAL_BACKUPS_PATH. A sub-direc­to­ry LOCAL_DB_NAME/files inside the LOCAL_BACKUPS_PATH direc­to­ry is used for the direc­to­ry backups.

Because rsync is used for these back­ups, you can put a .rsync-filter in any direc­to­ry to define files/​folders to ignore. More info…

For exam­ple, if you have a wiki with data/cache and data/tmp direc­to­ries that you don’t want backed up, your .rsync-filter file in the wiki direc­to­ry might look like this:

# This file allows you to add filter rules to rsync, one per line, preceded by either
# `-` or `exclude` and then a pattern to exclude, or `+` or `include` and then a pattern
# to include. More info: http://askubuntu.com/questions/291322/how-to-exclude-files-in-rsync
- public/data/cache
- public/data/tmp

Link Backups of Backups Offsite

Fan­tas­tic, we’ve got all of the web­site struc­ture we cre­at­ed backed up in git, and we have local data­base back­ups and local asset back­ups. We’re cov­ered for the most com­mon sce­nar­ios where data has been lost in one way or another.

But what about when some­thing goes tru­ly wrong, and our serv­er isn’t accessible?

What we need is some inception: backups of backups
Inception

While it’s great to have local back­ups — and they are by far the most use­ful in prac­tice — we also want to have off­site back­ups that can be used if the prover­bial sh*t hits the fan.

For this, we’ll use the pull_backups.sh script which pulls down all of the back­ups from the REMOTE_BACKUPS_PATH on a remote serv­er to the LOCAL_BACKUPS_PATH on the com­put­er it’s run from.

This pulls down all of the data­base & assets we’ve backed up on our remote serv­er via the backup_db.sh and backup_assets.sh scripts, and it does so via rsync so it’s very effi­cient in pulling down only the files that have changed.

This effec­tive­ly gives us an off­site mir­ror of all of our local back­ups that we can eas­i­ly access should the need arise. This off­site back­up can be to a local com­put­er, or it can be to anoth­er VPS that you spin up, as described in the How Agen­cies & Free­lancers Should Do Web Host­ing article.

Assum­ing you have set up ssh keys, you won’t even have to enter your pass­word for the remote serv­er. Here’s what the out­put of pull_backups.sh looks like:

vagrant@homestead /htdocs/nystudio107/scripts (develop) $ ./pull_backups.sh
receiving incremental file list
nystudio/db/
nystudio/db/nystudio-db-backup-20170317-000432.sql.gz
        435,059 100%    2.46MB/s    0:00:00 (xfr#154, to-chk=5/180)
nystudio/db/nystudio-db-backup-20170317-133213.sql.gz
        436,133 100%    1.65MB/s    0:00:00 (xfr#155, to-chk=4/180)
nystudio/db/nystudio-db-backup-20170318-183601.sql.gz
        436,381 100%    1.25MB/s    0:00:00 (xfr#156, to-chk=3/180)
nystudio/db/nystudio-db-backup-20170319-000001.sql.gz
        436,533 100%    1.01MB/s    0:00:00 (xfr#157, to-chk=2/180)
nystudio/db/nystudio-db-backup-20170319-002746.sql.gz
        436,821 100%  863.53kB/s    0:00:00 (xfr#158, to-chk=1/180)
nystudio/db/nystudio-db-backup-20170319-132355.sql.gz
        436,839 100%  743.21kB/s    0:00:00 (xfr#159, to-chk=0/180)
*** Synced backups from /home/forge/backups/nystudio
vagrant@homestead /htdocs/nystudio107/scripts (develop) $

If you’d like to sync your back­ups to an Ama­zon S3 buck­et, Craft-Scripts have you cov­ered there, too. 

The sync_backups_to_s3.sh script syncs the back­ups from LOCAL_BACKUPS_PATH to the Ama­zon S3 buck­et spec­i­fied in REMOTE_S3_BUCKET.

This script assumes that you have already installed awscli and have con­fig­ured it with your cre­den­tials. Here’s what the out­put of the sync_backups_to_s3.sh looks like:

forge@nys-production /htdocs/nystudio107.com/scripts (master) $ ./sync_backups_to_s3.sh
upload: ../../backups/nystudio/db/nystudio-db-backup-20170322-000001.sql.gz to s3://backups.nystudio107/nystudio/db/nystudio-db-backup-20170322-000001.sql.gz
*** Synced backups to backups.nystudio107

It’s rec­om­mend­ed that you set up a sep­a­rate user with access to only S3, and set up a pri­vate S3 buck­et for your backups.

Link Automatic Script Execution

If you want to run any of these scripts auto­mat­i­cal­ly at a set sched­ule, here’s how to do it. We’ll use the backup_db.sh script as an exam­ple, but the same applies to any of the scripts.

If you’re using Forge you can set the backup_db.sh script to run night­ly (or what­ev­er inter­val you want) via the Scheduler.

Forge Scheduled Backups

Forge sched­uled backups

If you’re using Server​Pi​lot​.io or are man­ag­ing the serv­er your­self, just set the backup_db.sh script to run via cron at what­ev­er inter­val you desire.

Craft-Scripts includes a crontab-helper.txt that you can add to your crontab to make con­fig­ur­ing it eas­i­er. Remem­ber to use full, absolute paths to the scripts when run­ning them via cron, as cron does not have access to your envi­ron­ment paths, e.g.:

    /home/forge/nystudio107.com/scripts/backup_db.sh

There we go, set and for­get auto­mat­ed backups.

Link Becoming a Digital Nomad

The oth­er fan­tas­tic ben­e­fit of imple­ment­ing a back­up sys­tem like this is that you effec­tive­ly become a dig­i­tal nomad. If you’ve set up your web­site via a pro­vi­sion­ing ser­vice like Lar­avel Forge or Server​Pi​lot​.io as described in the How Agen­cies & Free­lancers Should Do Web Host­ing arti­cle, you’re no longer teth­ered to any par­tic­u­lar host­ing arrangement.

Digital Nomad

You can quick­ly spin up a new serv­er, deploy your web­site to it by link­ing it to your git repo, pull your assets down to it, pull your data­base down to it, and away you go!

​This kind of freedom is a wonderful thing

It makes what used to be a scary, fraught-rid­den process of mov­ing to a new serv­er a piece of cake! Gone are the days when you’re dread­ing a serv­er migra­tion, or you don’t update or enhance your serv­er out of fear that you’ll break something.

Link Disaster Recovery Drills

The final thing that I strong­ly rec­om­mend that you do are dis­as­ter recov­ery drills. Use this new­found free­dom as a dig­i­tal nomad to actu­al­ly put your back­ups to the test.

Spin up a new VPS, and try restor­ing a web­site from scratch.

Practice Drill

There’s no bet­ter way to gain con­fi­dence in your dis­as­ter recov­ery plan than to prac­tice doing it. It sure beats sac­ri­fic­ing chick­ens and pray­ing when you’re under the gun and fac­ing an actu­al disaster.

To help you with this, Craft-Scripts comes with the restore_db.sh script. You pass in a path to the data­base dump, and it will restore it to the local data­base (after back­ing up the local data­base first). You can pass in a path to either a .sql data­base dump, or a .gz com­pressed data­base dump, either works.

Here’s the exam­ple out­put of the restore_db.sh script:

vagrant@homestead /htdocs/nystudio107/scripts (develop) $ ./restore_db.sh /home/vagrant/backups/nystudio/db/nystudio-db-backup-20170320-022335.sql.gz
*** Backed up local database to /tmp/nystudio-db-backup-20170321.sql.gz
*** Restored local database from /home/vagrant/backups/nystudio/db/nystudio-db-backup-20170320-022335.sql.gz

If all this seems like a lot of work, just con­sid­er it prac­tice. Craft-Scripts does a lot of the heavy lift­ing for you. The first time you do it, it’ll take a bit of time to get famil­iar with how it all works, but after that you’ll gain the con­fi­dence that comes with experience.

And you’ll also gain a very use­ful — and bill­able — skill set in your repertoire.