Intelligent Load Balancing

Posted: 10th Jul 2020

All of Servd's plans, with the exception of our smallest, run multiple copies of your application within our platform and distribute the incoming traffic between them. This process is known as Load Balancing and is performed transparently by the Servd infrastructure without your project having to worry how it's done.

Most load balancing is performed using the round-robin algorithm. This is a simple solution which sends a single request to each upstream instance in turn before looping back to the first. This ensures that requests (and theoretically the overall load) are evenly distributed over all of the upstream instances.

Servd has been operating using the round-robin algorithm since it first went live and it has performed well for the majority of sites, just as you'd expect.

We've recently been looking into a strange phenomenon that we've seen repeated over several projects, but primarily those using Craft CMS as a GraphQL server. Under heavy load a project with multiple instances could often be seen with unevenly distributed CPU usage.

In the follow graph you can see an application under load with three active instances. In theory the CPU usage should be approximately equal between all three as they're all handling the same number of incoming requests, however it's clear that the CPU usage is not evenly distributed:

We began investigating this by trying to determine why certain instances might be acting as a bottleneck and consuming more CPU per request than others. The instances are identical so there shouldn't be any noticeable performance difference between them.

We tested this by simply killing the CPU bound instance. This disrupts the load imbalance for a short time before it reemerges once the killed instance comes back online, but not necessarily with the same instance showing the higher CPU usage.

Having determined that the problem likely isn't with the instances themselves we began looking into how traffic is being distributed amongst them. Knowing that the round-robin algorithm was being used for load balancing we knew that each instance would be receiving the same number of requests, but each of those requests would incur a different workload on the instance that processed it. Perhaps one server was processing more of the high-load requests than the others? But what would be attracting those specific requests to a single instance instead of them being randomly distributed amongst them?

The answer lies in GraphQL and static site generation. Generators like Gatsby will use a single page template in order to create multiple pages. On large websites this might be 1,000s of pages.

The GraphQL queries within a template differ in complexity, but each time it is processed it will fire off the same set of queries in the same order.

If a template runs 3 queries and we have 3 application instances, our round-robin load-balancing algorithm will always send the same query to the same instance for every page being generated using the template. Therefore, the most complex query in the template will always be processed by the same instance.

This will also occur when a template contains queries in any multiple of the number of instances.

The simplest way to prevent this problem is to change the way that traffic is distributed over the instances.

A naive approach would just be to distribute requests randomly. This would break the pattern, but would lead to sub-optimal performance over any short timescale.

A better alternative is to use an intelligent load balancing algorithm. The solution we have opted for is 'peak exponentially-weighted moving average', or PEAK EWMA for short.

This algorithm tracks a moving average of the response times for each of the instances and uses that value as an inverse weighting when deciding which instance to send traffic to. An instance which is taking longer to respond receives less traffic.

This not only ensures that busy instances are sent proportionally less traffic, but it also breaks the pattern we saw above when requests are following a specific sequence. This results in a more even distribution of load over our instances and a higher maximum throughput.

On our Gatsby project we saw an immediate 15% decrease in full-site build times:

This change has been rolled out over all projects in Servd's platform and should provide significant performance gains for projects which often see repeated patterns of requests such as static site generators.

Proactively managing infrastructure to maximise performance can take a lot of time and tweaking. We're constantly looking for ways to improve the speed of all of our hosted projects so that you don't have to.