Testing in Production: Why It’s Part of Any Modern QA Strategy

Testing in production used to have a terrible reputation. And some (or most?) of it was probably deserved. But everything…

By Testim, January 20, 2023

Testing in production used to have a terrible reputation. And some (or most?) of it was probably deserved. But everything changes and the software industry is probably the fastest-changing “thing” ever.

Nowadays, testing in production is not only tolerated but actively encouraged in many situations. However, a bad reputation isn’t an easy thing to shake off. Many people are still—understandably—skeptical about the whole thing. Are you one of those? Then today’s post is for you.

testing in production

Here’s a summary of what you’ll be learning today:

We’ll start by defining testing in production
Then, we’ll explain the bad reputation of the practice
After that, we’ll cover testing in production in a bad way
Finally, we’ll show how to test in production in the correct way

Hopefully, by the end of the post, you’ll understand the reasons why testing in production is an essential component of a modern quality strategy. Let’s go.

What Is Testing in Production?

Before you go any further in the article, it’s essential we’re on the same page in regard to what “testing in production” means. That’s why we’re going to start by defining the term. So, what is testing in production?

Well, let’s start by clearing up a common misconception. Testing in production does not mean deploying untested features in order to secure time-to-market, hoping that everything will turn out ok when the customer tries to use them. As they say, hope isn’t a strategy.

A proper QA strategy still has to make sure testing is moved to as early in the process as possible and take full advantage of unit testing and other automated testing techniques. That is to say, testing in production isn’t supposed to replace proper testing done before production but complement it.

Testing in production, rather, refers to the continuous testing of the application in the production environment, after a deployment.

Poor Testing in Production: Here Comes Our Recipe

Let’s take a brief detour to cover the reasons why testing in production has carried such a stigma.

Why Not Test in Production?

Basically, it stems from differences in vocabulary. Many people, upon hearing “testing in production” don’t think of something remotely close to what we’ve described in our definition.

Instead, they think of the unprofessional process of deploying untested (or poorly tested) code and crossing fingers hoping for the best. They relate testing in production with a lack of proper software engineering best practices and the sheer inexistence of automated testing of any kind.

And to be honest, not all of the said bad reputation is undeserved. If you do it haphazardly, testing in production might put you in much trouble. Loss of data, financial loss, and tainted reputation are some of the consequences you might bring on yourself, to name a few.

And of course, in this post-GDPR era, we live in you risk catastrophic legal consequences as well. Additional risks might involve:

High error rates setting off alerts and waking up people on call.
Incorrect revenue recognition of generating revenue events (e.g., canceled orders.)
Unintended consequences on other production systems.
Noise in logs due to script and bot activity.

If you’re to avoid bad testing in production, you’d better learn about it. So, what are the ingredients of poorly done testing in production? Here is a small list:

Lack of testing in preproduction environments (i.e. you shouldn’t skip other testing methods simply because you plan to test in production).
No easy way to roll back faulty deployments.
Lack of proper backup strategy (which includes practicing backup restoration.)
Performing production tests at inappropriate times

Testing in Production Done Right: The Why and How

We’ve started the post by saying that testing in production, when done right, can net you unique benefits. Now it’s time to talk about said benefits. Why should you do production testing?

Is Testing in Production a Good Idea?

Why test in production? Sometimes you don’t have a choice. You might be in a scenario where designing a stage testing environment is impracticable or unaffordable. Also, often you need to gather real usage data, so you have no other route than turning to the real thing.

Some forms of testing in production yield better results when performed in the production environment. If you want to verify the scalability of your app, then load testing in production is what you need.

At the end of the day, it all comes down to how complex a thing software is. No matter how good your QA strategy is, how well you employ the best practices and state-of-the-art tools, some bugs will inevitably end up in production. Testing/monitoring your application in production is your last line of defense against such bugs.

testing in production

Testing In Production: Here Are the Main Benefits

Here are some additional benefits of employing this form of testing:

Since you’re testing with production data, you might be able to detect problems in scenarios hard to replicate in test environments.
It can help you create a disaster recovery process making your application more resilient against expected or unexpected failures.
It allows you to design beta programs enabling users to provide early, valuable feedback.
It naturally tests the way users use your application.
Testing in the production environment, when performed daily, reduces the risk in deployments when you monitor your application in real-time.

How to Do It?

How to perform testing in production the “right way?” We’ll now answer this question by covering some of the main techniques you can use to leverage the power of testing in production.

A/B Testing

A/B testing means a type of statistical experiment. You split a user base into two groups, A and B. You then give group A the most recent version of your app, called the control. The second group gets a modified version of the app, which we call the treatment or variation.

You can then compare how users in both groups behave. Analyzing the data you gather, you can conclude whether the changes in the treatment are worth keeping or not, and make an informed decision on what to do next.

Canary Releases

Canary releases, at first sight, might look a lot like A/B releases. Here’s how Danilo Sato defines them:

Canary release is a technique to reduce the risk of introducing a new software version in production by slowly rolling out the change to a small subset of users before rolling it out to the entire infrastructure and making it available to everybody.

So, basically, you release the new version to a subset of your users and closely monitor it, rolling it back if things go south. How does this differ from A/B Testing?

They differ in intent. A/B testing is supposed to gauge the interest of users in a potential new feature. Canary releases, on the other hand, are a risk mitigation tool.

Load Testing

Just for the sake of completion, we’ve included load testing. We won’t cover it here, but it’s important to understand how your application will react to high volumes of traffic or transactions.

Feature Flagging

When organizations want to use techniques such as A/B testing, how do they switch a given feature on and off? The answer is feature flagging.

Feature flags (also called feature toggles) are conditional toggles you can use to determine if a given feature should be exposed to the user or not. At its most basic form, a feature flag is nothing more than an if statement. A proper feature flagging management strategy would have a way of controlling the value of the toggles from outside the application.

Application Monitoring

Application monitoring is yet another category of activities that might be considered testing in production, even if it doesn’t sound like that at first. It can be categorized into two main groups: real user monitoring (RUM) and synthetic monitoring—aka synthetic testing.

Real user monitoring, as the name suggests, is the process of monitoring, in real-time, actual humans interacting with the application. With RUM you can see how the application handles real requests as they come in.

Synthetic monitoring/testing, on the other hand, refers to monitoring how the application reacts to requests coming from simulated—or synthetic, hence the name—users. Real user monitoring and synthetic monitoring are both valuable. Each approach has its share of strengths and weaknesses, and each is better suited for different scenarios; both of them are considered testing in production.

Production User Acceptance Testing

Many organizations adopt, as part of their SDLC, a heavy-weight round of user-acceptance testing that has to happen before any major feature is signed-off for release. In doing so, they delay the arrival of their code to production, where it can be used by real end-users.

Due to the increased adoption of techniques such as feature flagging, organizations today are able to use a different tactic: they forgo the comprehensive and lengthy acceptance-testing process that happens before production, trading that by verifications that happen in production. By hiding a feature behind a flag, it becomes trivial to allow access to it only to authorized personnel. After the testers give their thumbs-up, the flag can be switched off, allowing all of the users to access the functionality.

More Types of Testing In Production

Depending on your definition of testing, there’s a long list of activities and techniques related to software quality that can be considered “testing in production.” What follows is a non-exhaustive list of some of such activities:

Chaos Engineering
Automatic broken link checking
Visual regression testing
Disaster recovering testing
Accessibility testing

You can learn more about testing in production tools at this blog by our friends at Lightrun.

Testing in Production: Aye or Nay?

Everything changes amazingly fast when it comes to the software industry. Sometimes what seemed unthinkable a few years ago becomes commonplace and that’s exactly what happened with testing in production. Due to the widespread use of software engineering and QA best practices, we can now afford to test in production in safe ways, which enables us to reap benefits we wouldn’t be able to get otherwise.

As we’ve said, again and again, our industry is an ever-changing one. Recent developments in the testing scenario include the use of AI-powered tools that help teams overcome the starkest quality challenges they face.

So, testing in production: yes or no? We believe it’s a good supplement to your testing strategy and it can play a bigger role in large B2C environments where you can run a valid test without impacting a significant portion of your enterprise customers.

What to read next.

Automated Testing Tools for 2020: The 11 Essential Ones

10 Test Automation Best Practices You Can Adopt Right Now

Testing in Production: Why It’s Part of Any Modern QA Strategy

What Is Testing in Production?