The sideways test pyramid

This blog post is written in response to the question asked by @JuliaTorrejon “what tests should go in which steps of the pipeline during continuous testing?”. Thank you Julia, for inspiring this article and for your passion about quality!

The test pyramid is such an established concept that I presume everyone here would know about it. In case you don’t: it is a way of visualizing the distribution of the different types of tests that may exist on an application.

The concept is simple: the earlier you can find issues, the less expensive it is to debug and fix them. The test pyramid emphasizes this by showing test types that can be executed earlier with a bigger amount of tests than those that need the application to be put together to execute.

The basic example, you can think about it as a equilateral triangle pointing upwards, unit tests are at the bottom, integration tests in the middle and end to end tests in the top.

These are absolutely mandatory tests to guarantee a minimum quality of an application, however there are many more tests that we might need and that could be of help.

For more detailed example, with more types of tests, we could turn this into an actual pyramid, secret passages included. To avoid repeating myself, if you are interested in knowing more about it, there is certain book that talks about it in detail.

Test pyramid on CI/CD

Continuous integration and delivery (CI/CD) is becoming an established practice to properly develop applications, including continuous testing and automating steps that could otherwise lead to human error.

For example, if an application is not CD, different versions of features need to be manually cherry picked and added together for the deployment. This means we could accidentally select the wrong ones and produce unpredictable behaviors and issues in the application for the user.

CI/CD applications are usually distributed on what it is called a pipeline“, in reference to …well, pipelines, where things come in from an end and get out from the other end.

When we are thinking about a CI/CD application, we tend to distribute things left to right throughout this pipeline, left being things that happen earlier and right being closer to the final user. This pipeline is divided in steps, which are related to a specific environment in which the code gets deployed.

In this context, it would make more sense to have the pyramid turned sideways.

The base of the pyramid would be turned to the left (where things get in), the top of the test pyramid would be turned to the right (where things get out). Unit tests could happen before the pipeline, so it might be the case that the base is outside the pipeline.

In which step should they run?

Since each application and teams are different, and each pipeline is built with a specific purpose, it is kind of difficult to say which specific step should carry on which specific tests.

Generally we can see different environments represented in separate steps, but the total number of environments and steps per each environment vary a lot.

In general, end-to-end and front-end test need to happen once everything is built and connected, so we are not likely to see them until some pre-production environment. From then onwards, it would be the top side of the pyramid.

Unit tests, on the other hand, need to happen at the very beginning, usually before the code review have even happened, which means it might land outside of the pipeline or in the first step.

The sideways pyramid can give us an idea of the number of tests we should have across our pipeline, as in, most tests should be up to the first step, then a good number of them across and finally, the least at the very end. It also gives us an idea of where these tests should execute. However, these relations refer to  “unique tests”, and across a pipeline it is common that we run repeatedly a set of tests to validate the different environments.

Example of disposition of test pyramid vs pipeline

Test repetition on a pipeline

We want to validate environments as much as possible but, at the same time we don’t want to spend longer than necessary testing to avoid slowing down the deployments.

Running tests in parallel helps, but it’s still  important to keep balance.

Our objective should be to avoid running any unnecessary tests, and avoid testing things twice. That means, try not to do unit tests during integration and focus, well, on the integrations. I also have seen many times UI tests that cover to the detail things that have already been quite covered by integration tests rather than focusing, well on the UI.

Now, this part could be controversial, because I called those tests at the top UI and not end to end, and yes, you would also need tests that cover end to end scenarios including user layouts, which means that would likely happen at the same time as the UI tests. And yes, the more paths you test like this the better, but again, you should try to think carefully about what you are intending to test and if that has already been covered on a previous step.

That said, even if you are trying your best to avoid testing things that have already been covered, there must be some test repetition as we deploy in new environments and want to ensure nothing broke during each deployment.

I recommend to keep some build verification tests that can be re-triggered at different steps and while running the bigger amount of integration tests in the earlier step that you are allowed to do so.

Keep in mind that some pipelines might be distributed by regions, as we can have different servers or even builds related to them, so then we would have yet another repetition, which is unavoidable and it is to be taken into account.

What about other tests?

As I mentioned at the beginning of this post, the test pyramid is quite basic, other tests should also be considered. The same principle  stated before still applies here. Think about what is the minimum you’d need for those tests to run and that would give you an idea of the area where those tests should execute.

For instance, performance tests should go early on the pipeline, but we might need to push them to the right as the first environments tend to be less “powerful” to hold these sort of tests. However, for accessibility tests we need the application to be connected and front end to be ready, so they would likely go more right, closer to end-to-end ones.

We could also have some constant tests in production yo make sure the servers are working as expected and no big issues have leaked throughout the pipeline. These tests should not bother the users (including higher loads and resulting in longer waiting times for them) nor should they mess up with metrics from production behavior.

Discovery tests could go mostly at any point of the application, but likely towards the top of the pyramid or right side of the pipeline. They could also go in production, depending on what we want to “discover”.

Security should really be taken into account from the very planning of the application, but penetration tests would require everything to be connected, which would happen at the end of the pipeline, and so on.

As we have seen in this article, there are many different checks that we could be doing across our pipeline and throughout the development of our application. The next step we should be looking into is who should be writing those tests, but that’s well..another story.

Update: if you want a real world example, check out @practicaltester talk, thank you for reminding me of this!!

Leave a comment