6 smells that will ruin your performance testing validity!

Eldad Uzman
4 min readOct 5, 2021

Performance testing is tough!
One of the main challenges of Performance testing is making your tests executions as similar to a real life situation as possible.
When your test execution is too far apart from the real life situations your load tests are unreliable, and this could lead to nightmare effects on the confidence and trust in your load testing efforts.

Ecological validity.

I talked about ecological validity in a nut shell on a previous article.
Ecological validity is a major problem in empirical experiments when they are not conducted in their natural settings.

Ecological validity refers to the extent to which the controlled experiment apply to real life scenarios.

One example of that would be controlled experiments in nutrition science.
When people are confined into laboratory conditions their nutritional habits will be very different from what they would normally be.

For example, a study published in the American Journal of Clinical Nutrition in 1995 had 6 groups of 11–13 participants each, given an iso-caloric meal of ~240 calories of a single food after fasting over night.

Satiety questionnaire were filled every 15 minutes post partum for 120 minutes and then the researchers constructed the satiety index.

It turned out that potatoes were the most satiating food on the list.

Great.
But let me ask you this — do you normally eat a bowl of backed potatoes after 8–10 hours of fast?
Who eats like that? nobody!

This is why we should regard this study, although it is greatly controlled and has good internal validity, as one with a low ecological validity.

The impact of low ecological validity is that the experiment doesn’t reflect real life situations and thus conclusion based on such an experiment should be limited.

Load testing is subjected to the same problem!
If you don’t execute your load test on a “production like” environment, or your test script does not reflect a real life user flow(s), then your conclusions based on the load test are limited.

Here’s a few ways for you to make sure your load tests holds to descent ecological validity.

1: Utopian data storage

In many situations, we use empty data storage in our load testing environments (databases, cache services etc).

But in the real life scenario, our data storages contain massive amounts of accumulated data sets.
This effects any interaction with the data storage such as lookups, insertions, updates, deletions etc.

To avoid this problem make sure to estimate the storage size in a real life application and populate your testing environment accordingly.

2: Testers loopback flag

This is more relevant for microservices architecture, or in any situation where the server under test depends on one or more external services which we do not want to interact with when executing load tests.

The naïve solution would be to pass a special flag in the requests that signals to the server under test to generate a fake response instead of sending a request over the network.

This is wrong!
In the real life situation, the usage of network, just like any other I\O control, can lead to bottle necks either on the application code level or the infrastructure level.

To avoid this problem use mock servers.
To learn more about mock servers you can read my article on linkedin about Hoverfly.

3: Magic Scale Numbers

In testing environment we might use fixed number of resources (pods, endpoints, load balancers etc).

But fixed numbers are extremely cost inefficient so more often then not, we will use autoscaling solutions.

The result is that there’s an entire set of functionalities left uncovered, both scaling up and scaling down could have a profound effect on performance.

To avoid this problem make sure to combine tests with autoscaling functionality configured.

4: Bogus workloads

In testing environments we prescribe our work loads in a way we see fit.
But usually the way we see fit just “happens” to work :)

Because the real question is not “if it will fit”, but “what are you trying to fit it into”.

To put it in other words, we slam the system undertest with a scenario that doesn’t match a real user.
A real user for instance will not send one request after another in a meter of microseconds, they normally have a “think time”.

Another issue to look at is the distribution pattern.
The general rule of thumb, in my opinion, is that “things” tend to send requests with persistent loads, while people tend to send their requests in spikes.

So if your system is aimed to serve spike patterns but you tested a persistent load, the ecological validity of your test is low.

To avoid this problem make sure to do proper workload modeling and evaluate your model consistently.

5: Perfect Timing?

Time of day can impact our load capacity.
For example, if there are many scheduled tasks that our system is programed to perform every day at 00:00, this could impact the load handling.

If you run tests at some parts of your day but not in others your coverage is compromised.

To avoid this problem perform longitude soak tests.

6: All Over The world

Sometimes our load tests are executed from a single location or region while our application is expected to receive traffic from multiple regions.

To avoid this problem use SaaS or PaaS services that allow you to distribute load testing scripts from multiple locations.

On a final note:

To fully match your load tests to the real life situation, make sure that you are full engaged with your team and collogues.

Their inputs are crucial to you, your ability to take their perspective evaluate it and act upon it will make you a killer performance tester.

--

--