One of the most common problems when running automated end-to-end tests for a web application is how to handle test data. End-to-end tests often create, update, and delete all kinds of information as it goes through your test cases in the application. Inevitably, you'll run into problems because previous test runs leave behind a trail of stale data that may interfere with your current test execution.

For other forms of testing, managing data isn't as big of a problem. Some tests, like functional and integration tests, only check a tiny portion of your application, making it easier to clean up after each test run. In other kinds of tests, like smaller unit tests, you can get away with stubbing or mocking any data necessary.

With end-to-end tests, the ideal scenario is to go through the whole system as close to real-world usage as possible. That means you'll need to ensure that your system's services have the right data for testing and maintain proper data integrity between all moving pieces. You also need to keep your test data in a state that won't interfere with future test runs.

Sadly, it's easier said than done. Depending on your application's complexity and other factors like team size, available skillsets, and organization budget, you can handle this issue in different ways. If you're struggling to find a way to deal with this problem, here are a few ways to get you thinking about how to best tackle it for your situation.

1) Dedicated test servers for automated testing purposes

👍 The good

  • You can have a complete replica of the entire application under test that works the same as the production app.
  • A separate environment won't get in the way of manual or exploratory testing.

👎 The bad

  • It can get pretty expensive if you have a complex system architecture.
  • Separate servers require regular maintenance and upgrades to keep systems running smoothly.
  • You still need to deal with maintaining the data in a good state between test runs.

To get the most out of your end-to-end testing, it's ideal to execute them on an environment as close to your production environment as possible. In some organizations, you'll have your production application running on dedicated servers. Likely, they also have separate testing environments for UAT or staging purposes. While using these servers to run your automated tests is acceptable, it would be best to have an environment solely for automated testing.

By having dedicated servers running and ready for automated testing, you can create an identical production environment. That way, your automation can go through the application using a mirror image of the same structure your customers use. It also helps to have a separate environment that won't cause any conflicts with manual or exploratory testing.

This approach does have some caveats. Setting up a new environment can cost a lot of money if your system's architecture requires many different services to run. Running additional servers also comes at the expense of time if you have an in-house team managing them. Finally, having a different environment doesn't entirely solve the problem of having the right data for testing at any time. It makes controlling the data more manageable, like resetting or restoring a database. But you'll still have to manage that in your tests.

2) Spin up test servers using virtualization or cloud computing

👍 The good

  • You can spin up complete test environments at will.
  • Easy for developers and tests to create disposable test environments locally or online.
  • Cheaper than having dedicated hardware up at all times.

👎 The bad

  • Requires some up-front expertise to get running at the start.
  • It can slow down the testing cycle if you need to start lots of services for the test application to work correctly.
  • You can accidentally forget to turn off cloud services, which can end up costing lots of money.

Like having dedicated servers, you can also take advantage of virtualization and cloud services to generate separate testing environments. These days, most organizations and startups host their applications on some form of cloud service, and in most cases, you won't need any additional configuration for creating separate servers.

The great thing about virtualization and the cloud is that you can spin up and tear down fully-fledged services in an instant, so you don't need to deal with the time and cost expense of keeping them running at all times. Additionally, virtualization tools like Docker help developers and testers replicate the same environment on their local systems, making the end-to-end testing process simpler for unreleased functionality. At any time, testers can regenerate a new server with fresh data.

Virtualization and cloud services are an excellent choice for small and medium-sized teams, but it's also not a perfect solution. It will require some expertise at the beginning to ensure that the virtualized services run as expected. Spinning up new services can slow down the team's testing workflow if you have lots of services to run. Finally, keep in mind that cloud companies usually charge for the time you have your services running. If you forget to turn off a server after running your tests, you'll receive a surprise in the form of hefty service charges.

3) Add development flags to control data on existing staging environments

👍 The good

  • Allows testers to add and remove data during testing easily.
  • It can speed up some test scenarios by bypassing long workflows to generate test data.

👎 The bad

  • It requires development time to implement, which may be scarce in your organization.
  • It's another thing developers need to maintain in the application.
  • It can hide bugs and regressions if testers bypass long workflows and don't validate them separately.

Some teams create "development flags" into their applications. These flags are special functionality that allows the team to execute different commands that aren't available in a production environment. For example, developers can access special sections in the application when logged in with a particular account, or an API endpoint that bypasses some flows. These flags allow testers to automatically create database records without going through the application or clean the database before running a test scenario.

This approach is useful in testing environments since you don't need to worry too much about having the right data in your testing environment from the start. You can include a setup step in your test case to clean the database, and generate the data you need before each scenario. It can also speed up tests significantly since you can skip multiple steps to create a record. A typical example is bypassing email validation when creating a new account on a system. With development flags, you can create a validated account without dealing with the email verification part.

The downside of development flags is that it requires developers to implement and maintain these flags. In most organizations, developers are likely to have many other high-priority tasks to work on every day, from features to bug fixes and everything in between. These kinds of tasks are often lower in priority since it doesn't add direct value to customers, so it might not be feasible for the team to build and manage these flags. Another issue is that testers might rely too much on these flags, causing bugs to slip by because the development flags allow them to circumvent some sections of the application.

4) Create temporary mock data services for testing APIs

👍 The good

  • It can allow testers to work on automation while the application is under development.
  • Easy to set a clean state and modify it in the future without diving into the database.

👎 The bad

  • You're not testing on a real system, which is one of the primary goals of end-to-end testing.
  • Mocked services won't reveal problems in your infrastructure.
  • If the data structure for the application under test is continuously changing, you'll spend lots of time maintaining these mock services.

Testing should begin as early in the development process as possible. When the development team is still working on building new features and the basics, it might be impossible to run end-to-end tests using one of the previously mentioned strategies. But it doesn't mean you can't start with test automation until developers release something. Depending on your application, you can mock the data layer while keeping the rest of the application functional for testing.

These days, most applications use RESTful APIs for retrieving and storing data. If you have an application with an incomplete API, you can use libraries such as Mirage JS to create a service that can simulate your finalized API endpoints. If you don't want to build a mock API service, you can use online services such as mockAPI. These libraries and services let you easily control the data and responses you need and are suitable for starting with testing before you have the finalized API available.

Of course, using mocked services means you're not taking advantage of one of the main benefits of end-to-end testing - running your test scenarios against your environment. These libraries and services won't uncover potential issues that your servers may encounter, like security vulnerabilities or slow performance. Another problem is that if the API specifications change in any way, you'll have to spend additional time updating the mocked service for parity even if the changes don't affect your tests.

Summary

Handling the data needed to run automated end-to-end tests often isn't straightforward. These tests go through your entire application and leave a trail of changes in your test environment, making it tricky to clean up for subsequent test runs. Depending on multiple factors in your organization and application under test, you can use a few approaches to help make the testing process more comfortable for everyone involved.

Here's a summary of the strategies listed in this article:

  • If your team has skills, budget, and time available to maintain additional servers, you can set up dedicated testing environments to replicate your production services. Keeping your test environment separate gives you more control over the data, but it can get pricey, and you still need to set your data in a usable state before each test run.
  • If you don't have the budget for keeping dedicated servers up and running for testing purposes, you can take advantage of virtualization and cloud services. These services let you spin up a mirror image of your production environment on-demand and shut them down when you're done. You can even run these environments locally. It will require up-front investment to get running correctly, and starting on-demand apps can slow down the testing cycle.
  • If your development team is up to it, they can add flags to the application itself that allow testers to create and delete data as needed. These development flags will enable you to speed up tests by not going through some application flows. However, it's easy to forget to include test coverage for those flows, leading to bugs slipping through the cracks.
  • For applications that rely on APIs that are still under development, you can create mock services to simulate the data layer as needed. Mock services allow you to get ahead of the work that's in progress so you can begin testing as early as possible. But it's not a real system, so these services won't expose potential issues that appear on the finished product.

Note that these four strategies aren't the only ways to deal with your test data for end-to-end testing. They're some of the most common ways that work for different situations. Regardless of how you and your team handle this tricky scenario, it's a good idea to spend the time to make your test automation go as smoothly as possible.

What other ways do you manage data for your automated tests not covered in this article? Share your tips with others in the comments below!