How to test with exponential efficiency

10 min readMar 1, 2024

In the fast-evolving realm of software development, we face an ever-increasing complexity. This growth is not linear but exponential, presenting challenge to Quality Assurance practices. Despite the evolution of tools and methodologies, there remains a gap in effectively testing complex software systems.

Why software evolves and complexity of it increases

Complexity is more than just adding features, it involves interactions and dependencies that are hard to track and test. If each new idea can be combined with some percentage of all the previous ideas then the complexity and number of required tests will grow at rate O(n!).

Software is taking over the world. However, it’s also problematic, filled with bugs, lacking in security, and often not well-conceived.

Software is not built, it grows

Although we might think of software as complex structures similar to skyscrapers, the comparison doesn’t quite fit. Skyscrapers are built from detailed plans made early to guarantee their quality, whereas software doesn’t come to life through such precise planning.

Software doesn’t get constructed or designed in the traditional sense, it grows.

Software evolves from a basic to a more complicated form. The system is a gradual growth process with the development of the project itself.

Most testing is ineffective

Traditional “automated” software testing is surprisingly manual. Every scenario the computer runs, someone had to write by hand. Traditional, or example-based, testing specifies the behavior of your software by writing examples of it. Each test sets up a single concrete scenario and asserts how the software should behave in that scenario.

To cover exponentially complex software with tests, after each new feature you must write additional tests that cover the new feature’s interactions with existing features. This can quickly become cumbersome and lead to a combinatorial explosion of test cases, especially in complex systems where features interact in unpredictable ways.

To mitigate this, you might adopt strategies such as prioritizing test cases based on their importance or likelihood of catching significant bugs, employing parameterized testing to cover multiple scenarios with a single test, or using test coverage tools to identify untested parts of your codebase. However, even with these strategies, traditional testing might not be sufficient to cover every possible critical scenario, especially in systems with high complexity.

How to test with exponential efficiency

Realizing the linear efficiency of traditional testing propels us towards more sophisticated methodologies like Property-Based Testing (PBT) and model-based testing. PBT is a strategy that specifies general properties your software should always adhere to, instead of writing tests for specific scenarios.

Model-based testing lets us use models to predict how the software should work, making it easier to check if the software meets our expectations. Both ways help us test more thoroughly, making sure our software can handle a wide range of scenarios and is of higher quality.

Feature compliance: Does it really test the feature I am trying to provide?
Input scope covered: How well do I cover all possible inputs?

PBT has been popularized by testing libraries such as QuickCheck in Haskell, Hypothesis in Python, ScalaCheck in Scala, and FsCheck in F#.

We’ll utilize Hypothesis, a powerful library supporting property-based testing in Python. Here are some resources to get started:

Hypothesis quickstart: https://hypothesis.readthedocs.io/en/latest/quickstart.html
What is property-based testing?: https://hypothesis.works/articles/what-is-property-based-testing/
30 test examples for implementing PBT (thanks ChatGPT): https://chat.openai.com/share/52db477e-9917-471d-b698-8d2d242178e7

Simplified, PBT can be viewed as a sophisticated data generation tool that automatically generates test cases based on specified properties, which can lead to a overall software quality in three main ways:

Increase in test coverage: PBT automatically generates a broad set of input data, which can uncover edge cases that example-based tests might miss. You can generate inputs with custom datatypes of any complexity.
Decrease in the number of example-based tests: Since PBT is capable of generating numerous input scenarios from a single property definition, it can reduce the need for a large suite of example-based tests.
Automatic exploration of corner cases: PBT excels at discovering edge cases and corner cases that developers might not anticipate. By generating a wide range of inputs, including extreme, unusual, or invalid inputs, PBT can expose bugs that are only triggered under rare conditions.

Model-based testing

Being in the position of a test engineers, we are mainly engaged in integration and system testing of software. Model-based or stateful testing is a approach to software testing where tests are derived from an abstract model that represents the desired behavior of the system. This model reflects the possible states of the system, the transitions between these states and the expected outcomes or outputs. The primary objective is to validate that the system under test (SUT) behaves as expected according to the model across various states and transitions.

The idea is straightforward: instead of writing example-based tests for each combination of functionality, you create a state machine model of your System Under Test (SUT) with its states and transitions, and then use a tool like Hypothesis to generate all possible combinations of tests.

Craft your state machine

Hypothesis supports model-based / stateful testing by allowing you to define a state machine model of your system under test. A RuleBasedStateMachine class allows you to specify rules that transition the system from one state to another, and Hypothesis tries to generate sequences of these rules that might lead to failures.

Here’s an example of how to write your first state machine model for a subset of the Python list functions:

Only store integers
Create a filled list
Use list.append()
Use list.pop()
Concatenate two lists

import pytest
from hypothesis import strategies as st
from hypothesis.stateful import RuleBasedStateMachine, rule, Bundle

"""
# Define States and Rules
States are represented by the attributes of your class, and rules are methods decorated with `@rule`. Rules can take inputs (which Hypothesis will try to generate for you) and will modify the state of the machine.
"""
class ListStateMachine(RuleBasedStateMachine):
    list_ = Bundle('list')
    
    @rule(target=list_, items=st.lists(st.integers()))
    def create_list(self, items):
        return items
    
    @rule(list_=list_, item=st.integers())
    def append_item(self, list_, item):
        list_.append(item)
        assert list_[-1] == item
    
    @rule(list_=list_)
    def pop_item(self, list_):
        if len(list_) > 0:
            pop_item = list_[-1]
            item = list_.pop()
            # The popped item should be removed from the list
            assert (not list_ or item == pop_item)
        else:
            # no elements in list
            with pytest.raises(IndexError):
                list_.pop()
    
    @rule(target=list_, list1=list_, list2=list_)
    def add_two_lists(self, list1, list2):
        list3 = list1 + list2
        # Check two lists is appended to each other
        assert len(list1) + len(list2) == len(list3)
        assert list3[0:len(list1)] == list1
        assert list3[len(list1):len(list1) + len(list2)] == list2
        return list3

"""
# Execute the Test
To run the test, you use the **`run_state_machine_as_test`** function from Hypothesis, passing your state machine class as the argument:
"""
from hypothesis.stateful import run_state_machine_as_test
run_state_machine_as_test(ListStateMachine)

Hypothesis will then automatically generate sequences of examples to try and find a sequence that leads to an assertion failure or error, effectively testing a wide range of interactions in your system under test. By default it will generate 100 test examples with all possible state machine combinations. Here is one of the examples:

state = ListStateMachine()
list_0 = state.create_list(items=[17])
state.pop_item(list_=list_0)
state.pop_item(list_=list_0)
list_1 = state.create_list(items=[44, 83, 29, 63, 96])
state.append_item(item=-69, list_=list_1)
state.append_item(item=-28253, list_=list_1)
list_2 = state.add_two_lists(list1=list_1, list2=list_1)
state.append_item(item=962020445, list_=list_1)
list_3 = state.add_two_lists(list1=list_2, list2=list_2)
state.pop_item(list_=list_1)
list_4 = state.create_list(items=[56, 76, 97, 91, 27])
list_5 = state.add_two_lists(list1=list_2, list2=list_3)
state.pop_item(list_=list_3)
state.append_item(item=781, list_=list_4)
state.pop_item(list_=list_1)
state.teardown()

To enhance your skills in stateful testing, explore the following resources:

Hypothesis documentation for stateful testing: https://hypothesis.readthedocs.io/en/latest/stateful.html
One more real-word example: https://hypothesis.works/articles/rule-based-stateful-testing/
My repository with examples of stateful tests: https://github.com/fm4v/hypothesis-examples

How Hypothesis generates tests

Hypothesis generates tests by systematically exploring the space of possible inputs and sequences of rule applications. It uses strategies to generate inputs for rules and tries to find inputs that cause your assertions to fail. When it finds a failure, it attempts to minimize the failing test case to provide you with a simple example that demonstrates the failure. This process involves:

Generating inputs: For each rule, Hypothesis uses the strategies defined for the rule’s parameters to generate inputs.
Applying rules: Hypothesis selects and applies rules based on the current state of the machine, trying to explore the state space effectively.
Shrinking: If a failure is found, Hypothesis tries to shrink the input data and the sequence of operations to find the simplest way to reproduce the failure.

This approach helps find edge cases and interactions that might not be obvious, making it a powerful tool for improving the robustness of your tests and your code.

Real-world example: Authentication

Authentication is one of the most commonly implemented stateful systems in software engineering. It’s a critical component of security in any system that restricts access to resources or services to authenticated users. In the context of user CRUD operations within a software system, the term “stateful” refers to the system’s ability to manage the state of a user’s information throughout these operations.

Utilizing the ClickHouse database’s user authentication as a case study, I will demonstrate the application of stateful testing methods to software systems. Below are diagram illustrating the state machine model, including rules, states and virtual states.

A state machine is a conceptual model used to represent and manage the various states that a system or process can be in, along with the rules for transitioning from one state to another.

A rule is a condition that dictates when and how transitions between states can occur. It defines the logic that controls the behavior of the state machine.

A state represents a specific condition or status of a system at a particular time. It’s a way to describe what the system is doing or how it’s behaving right now.

A virtual state is a conceptual state used in the model for organizational or logical purposes. It doesn’t represent a state of the system but helps in managing complex transitions or grouping certain states together.

And here is diagram with exception transitions. These are transitions that lead to a specific state representing an error or exception condition. For example, attempting to delete a non-existing user, indicating that an invalid operation was attempted.

User state diagram with exception transitions

Discussing complexity, here is the complete specification for creating a user:

CREATE USER [IF NOT EXISTS | OR REPLACE] name1 [ON CLUSTER cluster_name1]
        [, name2 [ON CLUSTER cluster_name2] ...]
    [NOT IDENTIFIED | IDENTIFIED {[WITH {no_password | plaintext_password | sha256_password | sha256_hash | double_sha1_password | double_sha1_hash}] BY {'password' | 'hash'}} | {WITH ldap SERVER 'server_name'} | {WITH kerberos [REALM 'realm']} | {WITH ssl_certificate CN 'common_name'} | {WITH ssh_key BY KEY 'public_key' TYPE 'ssh-rsa|...'} | {WITH http SERVER 'server_name' [SCHEME 'Basic']}]
    [HOST {LOCAL | NAME 'name' | REGEXP 'name_regexp' | IP 'address' | LIKE 'pattern'} [,...] | ANY | NONE]
    [VALID UNTIL datetime]
    [IN access_storage_type]
    [DEFAULT ROLE role [,...]]
    [DEFAULT DATABASE database | NONE]
    [GRANTEES {user | role | ANY | NONE} [,...] [EXCEPT {user | role} [,...]]]
    [SETTINGS variable [= value] [MIN [=] min_value] [MAX [=] max_value] [READONLY | WRITABLE] | PROFILE 'profile_name'] [,...]

For demonstration purposes we will cover a small subset of this functionality:

CREATE USER name1
    [NOT IDENTIFIED | IDENTIFIED {[WITH {no_password | plaintext_password }] BY {'password'}} ]

DROP USER name 

ALTER USER name1 [RENAME TO new_name1]
    [NOT IDENTIFIED | IDENTIFIED {[WITH {no_password | plaintext_password }] BY {'password'}} ]

Here is an example of how to test user CRUD using a stateful machine. Lets cover this small part of the authentication functionality of the ClickHouse database with tests. We will cover the entire user’s CRUD lifecycle in just 100 lines of code. The full code can be found here:

class CHUserTest(RuleBasedStateMachine):
    created_user = Bundle("created_user")
    deleted_user = Bundle("deleted_user")
  
    @initialize()
    def init(self):
        ChClient().delete_all_users()
    
    @rule(
        target=created_user,
        user=st.one_of(new_user, consumes(deleted_user)),
    )
    def create_user(self, user):
        ChClient().create_user(user)
        ChClient(user).try_login()
        return user
    
    @rule(
        target=created_user,
        user=consumes(created_user), alter_user=alter_user
    )
    def update_user(self, user: User, alter_user: User):
        ChClient().alter_user(user, alter_user)
        user.update(alter_user)
        ChClient(user).try_login()
        return user
    
    @rule(target=deleted_user, user=consumes(created_user))
    def drop_user(self, user):
        ChClient().drop(user)
        with pytest.raises(ChError) as e:
            ChClient(user).try_login()
            assert e.error_code == 516
        return user
    
    @rule(user=st.one_of(created_user))
    def cant_create_existed_user(self, user):
        with pytest.raises(ChError) as e:
            ChClient().create_user(user)
            assert e.error_code == 493
    
    @rule(user=st.one_of(new_user, deleted_user), alter_user=alter_user)
    def cant_alter_non_existent_user(self, user, alter_user):
        with pytest.raises(ChError) as e:
            ChClient().alter_user(user, alter_user)
            assert e.error_code == 192
    
    @rule(user=st.one_of(new_user, deleted_user))
    def cant_drop_not_existing_user(self, user):
        with pytest.raises(ChError) as e:
            ChClient().drop(user)
            assert e.error_code == 192
    
    @rule(user=st.one_of(new_user, deleted_user))
    def cant_login_non_existent_user(self, user):
        with pytest.raises(ChError) as e:
            ChClient(user).try_login()
            assert e.error_code == 516

This is a powerful technique. 56 lines of code cover the following cases and their combinations in various order which, as presented in the state diagram above:

Users can be successfully created and logged in.
Users can be created with different types of passwords (no password, not identified, plain password).
Users can be deleted and will no longer be able to log in.
Users can be updated.
Existing users cannot be created again.
Non-existent users cannot be updated.
Non-existent users cannot be dropped.
Non-existent users cannot log in.

Here’s an example from one of the 500 executed tests:

state = CHUserTest()
state.init()
created_user_0 = state.create_user(user=User(name='Q', password=NoPassword()))
state.cant_alter_non_existent_user(alter_user=User(name='c', password=PlainPassword(password='jXHtr')), user=User(name='u', password=NoPassword()))
created_user_1 = state.create_user(user=User(name='BB', password=NoPassword()))
deleted_user_0 = state.drop_user(user=created_user_1)
created_user_2 = state.create_user(user=User(name='xY', password=PlainPassword(password='GcYOkgVab')))
state.cant_alter_non_existent_user(alter_user=User(name=None, password=None), user=User(name='y', password=PlainPassword(password='psvMb')))
deleted_user_1 = state.drop_user(user=created_user_2)
created_user_3 = state.update_user(alter_user=User(name=None, password=PlainPassword(password='wEC')), user=created_user_2)
state.teardown()

Here lies the exponential power of stateful testing. By adding new features to the system under test, we can simply add or update rules, and the test coverage will increase exponentially.

References

PBT libraries:

hypothesis in Python
fuzztest in C++
Quickcheck in Haskell
fast-check in JavaScript, TypeScript and PureScript
proper in Erlang and Elixir
FsCheck in F# and C#

How to test with exponential efficiency

Why software evolves and complexity of it increases

Software is not built, it grows

Most testing is ineffective

How to test with exponential efficiency

Model-based testing

Craft your state machine

How Hypothesis generates tests

Real-world example: Authentication

Written by Nikita Fomichev