Fail Fast, Succeed Faster…

Published in

Deloitte UK Engineering Blog

5 min readOct 30, 2023

Automated testing is a great way to speed up testing on a product. These automated tests can quickly grow in number, increasing test coverage and giving us the capability of trapping and reporting on issues and successes in our product far faster than if we were testing manually. Granted, there are setup costs — setting up the test framework and writing the tests, but once they are up and running… that’s where the unsung beauty of a testing strategy comes in.

We all want the tests to pass, right? Overall, yes; but if they’re going to fail, I want them to fail fast.

Let’s assume that we have a mature product, with 400 tests running via a DevOps Continuous Integration / Continuous Delivery (CI/CD) pipeline.

“Great!” everyone says.

“Not great!” I say, “Not unless you test smarter, not just faster…”.

What does that mean?

Think of it this way:

Imagine those 400 tests take 4 hours to run. That means… between starting a test run and finding out the health of the full suite of tests, there’s a lot of waiting time. And time waiting is time wasted. The developers have the code on their machines, they’re in the zone — the code is fresh in their heads. If they move away to new work, it’ll take time to context-switch back again, both mentally and physically.

Oh sure, we can speed the tests up. There are all sorts of techniques we can use, including using APIs to set up test data, running tests in parallel, and even (dare I say it) helping to design the user interface (UI) upfront so that it’s easier to test in the first place. Depending on the test framework being used, we could even set up test dependencies (i.e., “only run tests 2.1, 2.2, and 2.3 IF test 2.0 passes…”). This is all good stuff. But we need to be smarter. So, let’s look at a classic CI/CD pipeline.

Somebody checks code into the code repository.
The CI/CD server detects the code change, and
… Checks out the new code.
… Builds the product/environment for it.
… Runs all tests against the product.
… Reports on the quality of the product.

But… this takes too long. If there’s a problem, we need to know as fast as possible. Not four hours later. The easiest way of doing that is to test smart and split out the testing part of the pipeline into multiple steps (or jobs).

Health-Check Tests

But how? Initially, by creating a series of basic tests that exercise the product infrastructure. These could include any of the following things:

Submit/retrieve data via API calls.
Store/retrieve values from a database.
Exercise the product authentication system (logging in with various privileges).
Ensure that the front-end UI is being served.
Ping third-party systems or sites that the product relies on.

This small subset of tests will do one thing, and one thing only — it will tell you whether the system is ready for testing or not. And it’ll do it quickly (under a minute is not unrealistic), and it’ll do it cheaply. These Health-Check tests are what we need.

So now our CI/CD pipeline looks like this:

Somebody checks code into the code repository.
CI/CD server detects the code change, and
… Checks out the new code.
… Builds the product/environment for it.
… Runs the HealthCheck tests against the product.
IF the Health-Checks fail, STOP.
IF the Health-Checks pass, CONTINUE.
… Runs the full suite of tests, knowing that the infrastructure is suitable for full testing.
… Reports on the quality of the product.

This approach has multiple benefits. If the tests are going to fail, we know in seconds, not hours. If they don’t fail, we still run the full set of tests, and get a comprehensive readout on the quality of the results.

Critical-Risk Tests

We can split this pipeline out further by taking a risk-based approach to the automated tests. For example, if we had a critical module around which the product was based (maybe an API Gateway, or a specific calculation module), we could focus on those too:

So now our CI/CD pipeline looks like this:

Somebody checks code into the code repository.
CI/CD server detects the code change, and
… Checks out the new code.
… Builds the product/environment for it.
… Runs the Health-Check tests.
IF the Health-Checks fail, STOP.
IF the Health-Checks pass, CONTINUE.
… Runs the subset of Critical Risk tests.
IF the Critical tests fail, STOP.
IF the Critical tests pass, CONTINUE.
… Runs the remaining tests, knowing that the infrastructure and key functionality are suitable for full testing.
… Reports on the quality of the product.

Again, we’re focusing on critical failures, and halting work immediately to avoid waiting hours while tests that we know are going to fail take the time to fail. Plus, if you think about it… waiting 4 hours for a failure to pop out has other knock-on effects. This may halt other test runs from commencing, thus increasing the feedback loop even further, and if we’re running 4 hours of tests that aren’t needed — this takes processing time, and therefore electricity. Hence, failing fast is not only better for the developers, testers, and team, but it’s also better for the wallet and reduces the carbon footprint of testing activities.

The pipeline can be modified further to factor in unit testing of basic functions, static code analysis, security checking of software libraries or dependencies… you name it. But as your test coverage increases, remember to be smart about it, think about how you want your tests to run, and decide what your priorities are. Remember — we’re not duplicating test effort here, we’re choosing a smart order for the tests to run in.

Conclusion

So, to come back to the original point — automated testing is good, but it can be great if there’s a strategy to back it up. Using this approach, the same tests as before can provide results faster and cheaper, therefore keeping your teams productively focused on the task at hand, avoiding context switching and hold-ups.

Bottom line: don’t be afraid to fail fast — welcome it.

References

For more information on the topics discussed here, please see:

What is CI/CD? (no date) GitLab (Accessed: 27th October 2023)
https://about.gitlab.com/topics/ci-cd/

Jenkins: Build great things at any scale (no date) Jenkins (Accessed: 27th October 2023).
https://www.jenkins.io/

Top Test Automation Frameworks in 2023 (MacGregor, 2023).
https://saucelabs.com/resources/blog/top-test-automation-frameworks-in-2023

Note: This article speaks only to my personal views/experiences and is not published on behalf of Deloitte LLP and associated firms, and does not constitute professional or legal advice.

All product names, logos, and brands are the property of their respective owners. All company, product, and service names used in this website are for identification purposes only. Use of these names, logos, and brands does not imply endorsement.

Fail Fast, Succeed Faster…

Health-Check Tests

Critical-Risk Tests

Conclusion

References

Written by John Gimber