Testing Dilemma: Using Production Traffic for Testing

4 min readFeb 20, 2024

NBA referees review plays using the replay system. Image Source: esportes

One of the most challenging tasks in any technology team is to ensure functionality after new changes have been deployed to production. Even a 0.1% data error can significantly undermine customer trust, especially at a large scale.

Most of the time, the team becomes aware of bugs in production only after they have already caused significant issues. If we can get the feedback from production traffic, which includes diverse requests in different flows during the staging phase, the team can identify and fix bugs before the production deployment.

In this post, I will delve into the following topics:

Why it is impossible to avoid bugs in production
What a replay traffic framework is and how to build one
Three lessons from my experience

Why it is impossible to avoid bugs in production

There are three main reasons why it is difficult to avoid bugs in production:

System Complexity: Obscurity within the legacy code and incremental changes over years are challenging to manage. No one can be fully aware of all the domain knowledge, edge cases, hacks, and unknown unknowns.
Limitations of Unit Testing and Integration Testing: Each test case only focuses on one specific flow, making it difficult to fully predict the behavior of new changes across different flows.
Limitations of the Stage Environment: The stage environment does not match the scale of production and lacks data at the production level. It cannot simulate all production usage patterns and scenarios.

What a replay traffic framework is and how to build one

The replay traffic testing framework has two essential features.

A/B Testing — Its primary responsibility is to compare the responses from both the version 1 service and version 2 service.
A/B Testing Report — The team should be able to receive comprehensive feedback from A/B tests to identify potential bugs. The report should include the number of successes, the number of failures, the failure percentage, and details of the failed test cases.

How can we ensure the staging database data is synchronized with the production data during replay? We need to seed the data before running the production request.

Data seeding utilizes an audit service to provide three crucial pieces of information: production requests, headers, a data snapshot prior to the request. During the data seeding process, the replay service maps the data snapshot prior to the production request to the stage database. It ensures the data used in the replay test is synchronized with production data. Data seeding itself has technical limitations. If the information in the audit service is not accurate, the results of A/B testing will not be accurate. Finally, the replay service uses the production requests and headers to simulate the production request with both services.

Three lessons from my experience

The concept is straightforward; however, the underlying complexity is significant. What happens if the service makes a dependency API call, and how to ensure the state of the dependency service? What if a dependency itself has another dependency …

Lesson 1: What if the service has a lot of dependencies

Don’t do it. Dependency hell has plagued the industry for a long time. Replay traffic testing is not a pragmatic approach if the service has many dependencies. Find some other way to prevent production issues.

Lesson 2: Maintain the replay service requires huge effort

There is no silver bullet. Replay traffic testing has its technical limitations. It can only validate existing flows. In addition, as the system evolves, the replay service must evolve accordingly. If the team does not actively maintain and enhance the replay service, the accuracy of the results could be misleading.

Lesson 3: Achieve adoption can be challenging

Introducing replay production traffic testing into the CI/CD pipeline adds an extra layer of complexity. Engineers need to learn how to use it and follow the validation steps for every new release. If the team or the company is not technically driven or lacks a strong tech culture, it could be difficult to achieve adoption within the team. As the image below illustrates, there is satire there, but there is also some truth.