How To Make Craft CMS 3 Highly Available

*** This post was written in combination with a talk at a Craft CMS Manchester meetup. Slides here.***

There are a lot of good CMS solutions to choose from these days.

Many of them have learnt from the mistakes made by their predecessors and offer a robust solution for structuring and accessing content without being prescriptive of the method used to display it.

One of my favourites, @CraftCMS, is slowly gaining market share, displacing the established competition such as Wordpress at both the entry and enterprise levels.

builtwith.com

That's the main reason why I've chosen to run with it as the poster child for Servd, but this decision didn't come without its pain points. It turns out that Craft isn't 100% friendly with some modern high availability architectures and that posed a problem...

Why Do We Need High Availability?

There's no point in solving a problem just for the sake of it - so why did I even feel as though HA was worth pursuing?

In the context of a hosting product there are a few things that need to be avoided at all costs:

  • Downtime
  • Slow response times
  • Data loss

Any of these could cause customers to lose users, and lost users means lost revenue. HA addresses each of them by providing redundancy of both the application and its data.

If a server goes down another will be created automatically and traffic redirected to other healthy ones in the mean time.

If a server becomes overloaded traffic can be routed away from it and the misbehaving node restarted.

If a hard disk becomes corrupt the data can be rebuilt using copies from other storage locations.

Each of these scenarios is uncommon, but when working at scale and over long periods of time they become an inevitability and need to be protected against at the medium and enterprise levels. You can read more about HA in this informative post.

So HA is a good thing, but it also presents some challenges.

The Complication of State

By replicating an application and its data across multiple servers we introduce a synchronisation issue. If a request from a user results in a file being updated it will only be updated on the server which processes the request but not the others. This results in a different representation of the state of the system existing within its different components. This is a very bad thing.

The primary method of solving this problem is to make the replicated components of the system stateless by moving anything which persists state out to another service. These external services should also be HA but will provide their own, usually complicated, mechanisms for ensuring consistency across nodes.

The first step in actually applying this practically in the context of Craft CMS is to identify what state it stores and where it stores it. Here's a rundown from my own analysis:

  • Content Structure Configuration - The configuration of fields, sections, structures etc.
  • Content - Your actual content, objects created within the framework defined by the Content Structure Configuration.
  • Administrative Data - User accounts, settings etc. Data Craft needs to store that isn't related to your content.
  • Sessions - Data related to individual users' sessions. Traditionally defers to PHP's session handing.
  • Cache - Cached templates, and lots of other things that both Craft and plugins store in order to speed things up.
  • Assets - Uploaded files and images.
  • CPResources - Bundles of static assets (CSS, JS, images etc) which can be required by Craft or its plugins. These are 'published' by Craft when they are first required.

The follow diagram represents a non HA server hosting a Craft website along with its ancillary services and where each of the above pieces of state resides.

Not very HA right?

We need to find a way to pull all this state out of this server so that we can start replicating it without inconsistencies.

Lets start with an easy win: externalising the database. This particular optimisation is pretty standard and Craft allows us to do it super easily by just providing an external host in the database config. Nothing new or exciting here:

return [
    'driver' => 'mysql',
    'server' => 'external.db.host',
    'port' => 3306,
    ...
];

This allows us to update our server diagram to something that looks like this:

A good start.

Next up we'll move the Session and Cache data. There's a few ways to accomplish this with varying degrees of performance, complexity and scalability. The most popular options are using the database to store session info or using an in-memory storage solution such as Redis. The latter is my preference due to its scalability so that's what we'll do for our example.

To make this work we need to tell Craft to use Yii's Redis session and cache drivers and give it the appropriate details to connect to an external Redis server.

return [
    'components' => [
        'redis' => [
            'class' => yii\redis\Connection::class,
            'hostname' => 'external.redis.host',
            'port' => 6379,
            'database' => 0,
        ],
        'session' => [
            'class' => yii\redis\Session::class,
            'as session' => [
                'class' => craft\behaviors\SessionBehavior::class,
            ],
        ],
        'cache' => [
            'class' => yii\redis\Cache::class,
            'defaultDuration' => 86400,
        ],
    ]
];

For more on what's going on here you need to understand some of the more complex bits of Yii 2 and read this: https://www.yiiframework.com/d....

Now our server diagram is getting a bit more interesting.

Next up we have Assets which by default are placed on the local filesystem but Craft allows us to use additional storage mechanisms by installing plugins which provide an interface to them. The method for allowing this is also well defined and documented so creating your own asset plugins is relatively painless.

For my application I'll be using Google Cloud Storage which has an official plugin available. Once installed and configured there's not a lot else to do 👌

Lets add an ingress load balancer to our diagram to make things a little more #cloud:

So far we've just been using Craft and Yii's built in functionality in order to offload our state. Our final piece of state cpresources is a little more difficult however.

The contents of this folder are generated on the fly as they are required by Craft itself. This means that in a multi-server environment the contents will be created on the server that receives the request which requires it. Craft then responds with a reference to the newly created content as JS, CSS and image links.

Once these links have returned to the requestor's browser they will be downloaded using an additional HTTP request. If this follow-up request is directed to a server which hasn't yet generated those resources we start to have problems.

When I first started investigating this issue the above scenario would result in a 404 response due to the requested files not being available on the local filesystem. Subsequent updates have changed this behaviour so that if a cpresource file request results in a 404, Craft will regenerate the resource bundle and stream the newly created file contents through the PHP interpreter back to the requestor (streaming arbitrary files through PHP is generally considered a bad thing).

So this problem seems to be largely solved and, if we want to, we can stop here and pat ourselves on the back. But that would be an anticlimactic end to this endeavour so consider the following...

In a well designed hosting environment we need to optimise server resource usage. One way to do this is to scale our individual application components separately. This includes our webserver(s) and PHP interpreter(s). So let's pull our webserver out into a separate service which can be scaled independently - optimising memory and allocated CPU time within our hosting environment.

By doing this we've made our platform a little nicer to scale and maintain, but we've also introduced a new state issue. The webserver (nginx) no longer has direct access to the filesystem which the PHP servers are writing their cpresources files to.

A typical server setup like this will ensure that any requests which result in a file not found error on the webserver will be forwarded to a PHP server so that it can have a try instead. As the files will never exist on the webserver's filesystem this will occur for all cpresource requests.

Craft tends to generate a lot of cpresource links for each page request, especially when multiple plugins have been activated. It isn't unusual for multiple MB of CSS, JS and images to be requested for a single CMS page load. We don't want to be piping all of these through the PHP interpreter.

In order to solve this I spent some time looking around Craft's internals to see how it generates the cpresource bundles. I hoped to be able to find a way to offload all of the cpresource files to an external filesystem in the same way that user uploaded assets can be. I discovered that all of this functionality is handled in a single Class called 'AssetManager'.

Although customisation of the behaviour of 'AssetManager' isn't provided out of the box, one of the great things about Craft (and Yii) is the flexibility that it allows as standard. Craft allows us to replace internal Classes with our own implementations like so:

Craft::$app->set('assetManager', function () {
    $generalConfig = Craft::$app->getConfig()->getGeneral();
    $config        = [
        'class'           => CustomSubclassOfAssetManager::class,
        'basePath'        => $generalConfig->resourceBasePath,
        'baseUrl'         => $generalConfig->resourceBaseUrl,
        'fileMode'        => $generalConfig->defaultFileMode,
        'dirMode'         => $generalConfig->defaultDirMode,
        'appendTimestamp' => false,
    ];
    return Craft::createObject($config);
});

All we're doing here is telling Craft to use our subclass of AssetManager whenever it attempts to resolve the component that it refers to as 'assetManager'. By adding this code to a Craft plugin we are now able to replace specific methods within Craft's internals without having to touch the source code itself!

All that's left is to overwrite the methods which are currently set up to write cpresources to the file system and replace them with our own implementation. This is largely just file wrangling using appropriate AWS/GCloud SDKs mixed with a bit of optimisation. You can see the (currently rough) plugin which I put together to get this working here. It allows cpresources to be shifted to AWS S3 or Google Cloud Storage.

With this plugin Craft will publish resource bundles to the remote file storage and then return a link to them in that location. In doing so it ensures files are never streamed through the PHP interpreter and we completely avoid the local filesystem state issues we've previously discussed.

The only downside is that the very first login with the CMS can take a few seconds to process as the bundles are all generated and uploaded to S3 or GCS. However I'm hoping to fix this by performing a batch publish on all resource bundles using a CLI command.

Our final server setup looks like this:

No state in any of our elastically scaling components!

Once complete, Servd will offer all of this functionality as standard with zero input from you. If you're interested in giving it a try you can follow on twitter, DM me or sign up to my mailing list below.


Read Next



2024 Goals
Write Things