When can we say that a software is tested “enough”?

5 min readDec 18, 2023

The principles and technique to help understand when test coverage can be considered as “enough”.

You’re building a very important feature for your customers. You know that a defect found by your customer in your software will hurt their trust in your product so you wanted to be very careful about this, but how do you know that you’ve tested enough?

This post aims to answer that question, especially for functional black-box testing.

The seven testing principles

First of all, let’s take a look at the seven testing principles that are introduced by the International Software Testing Qualification Board (ISTQB). According to ISTQB, the principles “offer general guidelines common for all testing.”

This will be our foundation in understanding what is considered enough coverage for testing:

Testing is context dependent
Testing is done differently in different contexts, safety-critical software will be tested differently from e-commerce, so keep in mind of the context when testing.
Exhaustive testing is impossible
It is not feasible to test every single possible cases except for trivial cases, so you need to use risk analysis, test techniques, and priorities should be used to focus test efforts.
Defects cluster together
A small number of modules usually contains most of the defects, predicting it is an important input into a risk analysis used to focus the test effort.
Testing shows the presence of defects, not their absence
Because it is costly to test everything, testing is a mechanism to show the presence of defect and can NOT prove absence of defect nor that the software is correct.
Early testing saves time and money
It is better to detect defect early as it helps reduce or eliminate costly changes.
Beware of the pesticide paradox
Repeating the same set of tests will not uncover new defects so test data may need changing and new tests may need to be added.
Absence-of-errors is a fallacy
Despite all the test, a defect-free software is still an unusable software if it does NOT follow the user’s requirements.

For this post, I intentionally changed the sequence and rephrased the meaning of the principles so that it is easier to understand them all cohesively — so try reading only the meaning from top to bottom.

If you’d like to, you can see them yourself from the original document from ISTQB.

Now you can see that we can surmise the criteria of what is “enough coverage” as below:

There is no other meaningful test that can be run (redundancy-free), judged by doing risk assessment and testing techniques
The software can be considered as usable by the user
Tests are made with the software’s context in mind: (1) the environment such as viewport, OS, hardware, etc; (2) the user, such as tech-saviness, while multi-tasking, low internet connection, etc.

The testing techniques

Testing techniques are tools to help us cover enough testing ground. These techniques are generally made to reduce redundancy in tests and focus on areas where the defects are most likely to happen.

Note that the techniques listed below are by no means exhaustive, but only a handful that I have learned and understand so far. In this article, I only touch the overview of each technique; if you’d like to learn the step-by-step, you can learn more from my website.

Just like all other tools, no one tool is right and each shall be applied — solely or in combination with another — accordingly.

Equivalence Partitioning

This technique splits data into multiple valid and invalid partitions that are expected to be processed in the same way and makes a test case from the value in each of the partitions. It ensures that each condition is tested at least once.

Valid partitions should contain values that are accepted by the system, while invalid partitions are values rejected by the system.

The technique considers 100% coverage when all partitions are tested by at least one value from each partition.

The technique works better for data with discrete values. Check the Boundary Value Analysis technique for work with sequential or numerical range data.

INTERESTING THOUGHT
In the fairy tale Goldilocks and the Three Bears, Goldilocks uses similar technique to this when trying the three bear’s properties:

Papa bear’s Son bear’s Mama bear’s invalid valid invalid Porridge too hot just right too cold Chair too hard just right too delicate Bed too hard just right too delicate.

Boundary Value Analysis

Most often, the defect of a numeric range or sequential data can be found on its edges rather than the middle value. This technique is an extension of the Equivalence Partitioning technique that focuses on those edges to make sure the boundaries are set correctly as values on the boundaries are more prone to defect than ones that are within.

In this context, each partition will have valid values and invalid values.

The technique considers 100% coverage if all valid and invalid values for all partitions are tested at least once.

Decision Table

Also called the “Cause Effect Table” because it maps a system’s effects and its causes. Laying down each variable and its states (boolean or discrete) alongside the effect it causes.

This is best used to record and test complex business rules that a system must implement.

The technique considers 100% coverage when all the decision rule is covered. Note that the strength of this technique lies in making sure no decisions are left untested, so this may a huge number of cases need to be created and run.

Pairwise

The technique is based on the observation that most faults are caused by interactions of at most two factors. It significantly reduces the amount of test needed simply by making sure a pair of each test are only tested once whenever possible.

This is best applied to settings or configurational features.

The technique considers 100% coverage when all variable value pairs are represented in a test at least once.

Concluding thoughts

Here are the key takeaways from the above.

No software can be defect-free as it is not viable, so tests should be done just enough to make you confident that it is stable and reliable enough before releasing it to the user
There are testing techniques that help you get an idea of how much can be considered just enough such as:
Equivalence partitioning, ensures all valid and invalid conditions are tested.
Boundary Value Analysis, ensure boundaries between value ranges are properly established.
Decision Table, ensure no decision rules are left untested.
Pairwise, for combinatorial values.

An additional point to make from my reflection is that, the technique helps us understand what is “enough”, but as the principle said, once it is more stable, it is wise to add more tests to uncover more defects — keeping in mind that a usable software is more important than a fully defect-free software.