Value Driven Software Performance Testing

How to ensure performance testing investments provide value to your organization

Published in

Slalom Build

10 min readMar 28, 2022

Performance testing can easily turn into wasted effort unless testing goals and objectives are clearly defined. In this article I will present a framework that helps ensure performance testing efforts contribute business value.

Value from performance testing

Value from performance testing is achieved by answering a specific set of questions. Identifying these questions is critical to ensure we get value from our efforts. A few examples of such questions are:

Does the current system configuration meet defined performance requirements?
What is the minimum configuration required to meet a defined performance benchmark so that the operational costs of the system are not too high? For example, if 32 containers in a cluster are enough to deliver the performance, the team doesn’t need 64 containers, which would be more expensive.
What are the performance related risks to the business? For example, not meeting consumer SLAs could potentially cause legal issues.
What are the performance bottlenecks in a complex system?
What potential changes can the team make to improve performance in order to meet requirements, such as adding caching mechanisms?
Which architecture has better performance when comparing different architectures?

Answering the relevant questions should be prioritized based on the possible impact and the cost to obtain an answer. The highest priority questions are those which could have large impacts but cost the least to answer. Of course, the opposite are the questions which have the least impact but cost the most.

We will now walk through the planning, implementing, execution, and post-mortem phases of performance testing, and steps you can take to ensure the effort has value.

Planning Performance Tests

Defining test requirements

Planning performance testing starts by understanding the types of tests needed to answer the identified performance questions, e.g., load tests, stress tests, soak tests, spike tests, etc. It’s critical to have detailed requirements defined for what real tests look like. For example, ramp up, then follow daily traffic pattern (such as peak during UTC evening time, bottom at UTC morning time), then ramp down.

With test types selected and traffic patterns defined, it is crucial to consider what metrics are needed for the tests, which tools are needed to collect the data for the metrics, and which are needed to prepare a testing report and present the results to stakeholders. It’s important to identify the appropriate audience for testing reports as well as the specific information these stakeholders want to see, so that appropriate data are gathered during test and suitable tools are used to generate the reports.

Performance acceptance criteria

Performance acceptance criteria should be defined before the software being tested is built, usually while other software requirements are being gathered. When defining the criteria, the first step is to identify stakeholders and get expectations on performance. Consensus on performance acceptance criteria must be reached before moving on with performance testing implementation.

Performance acceptance criteria should be part of test requirements and can come from different sources. Here are a few examples:

Consumers/users: For example, API consumers may have specific performance requirements for the API they consume.
Industry benchmark: For example, average performance numbers, such as page load time, may vary by industry for certain websites.
Existing system baseline: If an older system or similar software exists, then its current performance could be used as reference.
Self-imposed: Some teams may define their own performance criteria in the beginning of a development effort, especially when the software is brand new lacks the added information that can be found from other sources listed above.

Acceptance criteria for websites

For website performance testing, it’s common to look at page load time as a performance metric. But there is some complexity that comes with it. Firstly, different events can be used as signals of the page loading being finished, so the page load time may vary when measured through different events. Secondly, pages may have very different page load times. Thirdly, sometimes users or development teams may be more concerned about performance of specific components in a page, which then may need to be measured specifically.

Industry average page load time benchmarks can be found from various sources online. But use caution, as different organizations may have different business logic behind web pages, which may deliver dramatically different performance.

Acceptance criteria for APIs

Average response time, 95th percentile response time, and error rate are probably the most popular high level performance indicators. Most people only focus on average response time, but the percentile metric shows the distribution of response time, so it’s valuable too.

Benchmarks can be found from previous systems within the same organization, similar systems from other organizations, industry data or other sources.

Acceptance criteria for data stores

Depending on whether tests are focused on OLTP or OLAP, different metrics may be used as acceptance criteria. For example, for OLTP tests, average response time by query type and error rate could be used as metrics of acceptance criteria.

Selecting tooling

Test requirements should be used as criteria for aligning tools with goals to ensure the expected value being delivered. For example, if a team wants to find out the performance of an API under certain traffic, such as 4000 RPS (Request per Second), many API performance testing tools may not be sufficient and JMeter with a specific plugin may be one of the few choices available to drive traffic at the specific RPS. Besides testing requirements, the following criteria may be considered for tooling:

Traffic patterns: Define the volume of traffic during specific time periods and generate the traffic based on the pattern defined.
Test data: Generate and manage test data centralized or distributed.
Load generators: Set up and manage distributed load generators, with each generator producing part of the total traffic volume.
Test results: Track test results and store data in a centralized or distributed repository.
Test reporting: Process test data and generate reports based on requirements from different stakeholders.

There are two types of tools — OTS (off the shelf) and scripting. Scripting tools are built by developers specifically for performance testing. OTS tools are existing tools built by outside organizations — both paid commercial, and open-source.

Scripting

There may be times when there’s no OTS tool available to meet the test requirements, which leaves scripting or programming languages with support of libraries to build performance testing and monitoring tools. Depending on the languages and frameworks used, for example multi-threading in Java or multi-processing in Python could be used to write performance testing tools.

Because writing a performance testing tool will take additional time and effort, several factors should be considered before moving forward with a custom-built tool:

Whether the business value delivered can justify the cost to build and maintain this tool.
The structure of the tool needed to meet business goals with least effort.
For the architecture, the simpler, the better.

Designing performance test architecture

The architecture of performance testing infrastructure will most likely include load generators, controller(s) for the load generators, test data storage, test reporting processor, and other components. For low traffic testing with high end machines/containers having multiple cores and lots of memory, one machine/container may be able to execute all the work. But for high traffic testing or when a distributed load is needed, a complex architecture may be necessary. This may include one or more controllers for load generators, distributed load generators, separate data storage, and a machine for test results processing/report generation.

Test case design

There are several aspects of performance testing which need additional attention compared with standard test cases. First, test data design is more important. Second, for DB performance testing, depending on OLAP or OLTP tests, what queries should be run in tests. For API testing, what are the different headers, payloads, and endpoints that should be covered in tests, and what percentage of traffic goes to each endpoints. Third, for website testing, determine which pages should be visited in which order through which location (such as en-US or fr-FR). If there is workflow in site, such as purchase and checkout flow for products, then the test case may have multiple steps.

If there is a small number of test cases when performance testing, results may be held in a simple spreadsheet or document. But the best way is to manage and maintain the test cases is through tools, such as TestRail.

Team management

Depending on the size and complexity of the software being tested—and the testing effort required—there may be multiple teams and individual roles involved, each with different responsibilities and activities to perform. Here are some examples for the roles, responsibilities, and activities:

End users:

Come up with performance criteria
Receive appropriate testing reports and review testing results

Performance testers:

Design test cases
Prepare test data
Set up test environment
Run performance tests
Analyze test results
Prepare testing reports
Rerun tests
Tear down test environment

Software developers:

Review test cases and data
Review test results and reports
Define and discuss performance improvement solutions

Development/quality manager:

Review and approve test plan
Approve test results and performance

Implementing Performance Tests

Set up test environments

Test environments can be local or cloud-based. Configuring the testing environment is very important, especially when it comes to network structure. The following factors may need to be considered:

The correct system is set as target system to be tested.
Testing environment and software being tested are located as expected and they can access each other. For example, test machines are geographically distributed to mimic real user traffic from different countries. If test infrastructure and the software being tested are all in the cloud and separate private nets, then they should be able to access each other.

Prepare data

Depending on the software being tested, data preparation for performance tests may be different. For DB testing, it’s possible to obtain data from production environment. Alternatively, SQL scripts can be developed to generate test data. For API testing, it’s better to get production data. Or if production data is not compatible, adjust production data to be suitable for performance testing. It’s also possible to generate data using scripts or programming languages. For website testing, one scenario is to use scripts to generate data during testing—for example generating user registration data. There may also be data, which have to be obtained before testing, for example credit card and address data if purchasing/shipping products is a test case.

Executing Performance Tests

The team should have specific people run performance tests. These people should be familiar with the testing infrastructure, tools, and data. This role may need to communicate with related stakeholders to block other users from using the testing infrastructure and the software being tested. After all preparation is done, tester(s) carry out the tests. Before running a full-length test, like a test lasting an hour or longer, it’s a good practice to just test run each test in minutes to confirm test can be executed successfully, infrastructure works fine, data is accessible, test results can be collected, report can be generated, etc.

Analyze results and generate reports

In the ideal situation, your tools will automatically gather test results, analyze them, and generate reports while tests are in progress. That way, once the tests are completed, there is little waiting so all that remains is report customization. With this type of tool, it is sometimes necessary to disable result analysis and report generation during testing, and instead run the analysis and reporting after the test is finished. The reason is that the analysis and report generation are usually done by controller of load generators. When a high traffic test is being run, the controller will be busy with testing tasks, and analysis/report generation will slow it down. Which can, in turn, cause load generators slow down, invalidating test results.

When a team builds its own performance testing tool, they may also need to develop an analysis and reporting system if they cannot find an OTS one. It will be important that the reporting tool can process large amounts of result data in a short period of time.

Teardown

After testing is done, it has been confirmed that all test requirements are being met, and the team doesn’t need test infrastructure any longer, then teardown work should start. This may include, but is not limited to:

Copy/move test results and related data from testing infrastructure to a different place for backup and potential further analysis.
Shutdown machines/containers being run for testing.
Release other resources used in tests, such as storage space for data.
If there are environment configuration changes made solely for testing purposes, the changes should be reversed.
Notify other users that the environment hosting the software being tested can be used by others.

Performance Test Post-Mortem

Problems encountered during tests should be identified, root cause analysis should be performed, and corrective or preventive actions should be identified, discussed, and carried out as planned once stakeholders agree.

There may be situations where tests don’t answer all the questions identified at the start of the project—which means expected value is not delivered. In these situations, further testing may be needed after the test environment is adjusted or other changes are made. Or, in dramatic scenarios, alternate test infrastructure may be needed to run the tests with different tools.

Some common pitfalls

Here are some of common pitfalls in performance testing:

With new software, it may be difficult to gather enough diversified testing data, such as JSON payloads for API testing.
For DB performance testing, it can be difficult to get enough production-like data to test OLAP performance.
It may not be easy to set up a distributed load testing environment with open-source tools using limited documentation.
When a target environment being tested is shared by multiple teams, it may not be easy to find a time window, so performance tests can be run against the environment.
Test infrastructure performance is not as good as the software being tested. For example, for API testing, there may be times that load increase can cause the average response time to drop. in this case it could be that the load generators are reaching their limits and will become slow to generate loads and process response data.

Conclusion

To make sure performance testing deliver great value to project stakeholders, certain steps should be followed:

Identify questions to be answered through performance testing
Plan activities, and design and develop tools to answer questions chosen
Implement testing activities
Perform post-mortem to verify whether the expected value is obtained through testing and what action should be taken moving forward