Testing implementation details is usually a bad idea

There are several disadvantages to testing implementation details.

Testing implementation details conjures images of a super-corporate culture where only management is allowed to make decisions. If management has to know about the minutiae of every decision and there is no autonomy, creativity is discouraged, and the productivity pace is considerably slowed. Also if there's a vicious approval cycle, changes and initiatives take much longer to be implemented. All because we need sign-off from managers. A better approach sees management providing directives and examining the output of the department. This way the department is free to devise its own strategies for meeting its mandate. They can change the internals however they see fit, once the outcome is preserved.

It's a similar situation with testing internals. In much the same way as a corporate company, testing internals creates rigidity in our code base that slows us down and doesn't provide much value.

To illustrate my points below let's consider a buggy implementation of a CacheClient. Let's imagine this is a thin wrapper around some library that allows us to talk to a cache.

class CacheClient:

    def _generate_namespaced_key(self, key: str) -> str:
        """Generate a namespace to avoid collisions"""
        return f"pluto-{key}"

    def set(self, key: str, value: object):
        """Sets a value in the cache"""
        namedspace_key = self._generate_namespaced_key(key)
        # Oops, we forgot to use the namespaced key
        self._set_data(key, value)  

    def get(self, key: str) -> object:
        """Retrieves a value from the cache"""
        namespaced_key = self._generate_namespaced_key(key)
        return self._get_data(key)

The CacheClient has public set and get methods that store and retrieve data from the cache. It prepends a prefix to the provided cache key to accomplish some namespacing. In a full example, the prefixes could be used to namespace different uses of the cache client across the codebase to avoid potential collisions. I've kept it simple and just used the prefix "pluto".

Non-behavioral changes cause breakage

When we refactor our code the tests break because they are too tightly coupled to the implementation. If we change a variable name or make any changes to the implementation our tests break. This is undesirable because we haven't even changed the behavior. From an application perspective, we haven't changed how a component works (behavior) yet still we have to update the tests.

Let's consider a quick test that illustrates the point.

def test_that_checks_how_data_is_saved_in_the_cache():
    cache_client = CacheClient()
    cache_client.set("my-key", "my-value")

    # This uses the _generate_namespaced_key and _get_data functions
    namespaced_key = cache_client._generate_namespaced_key("my-key")
    assert cache_client._get_data(namespaced_key) == "my-value"

If we decide that _generate_prefix is a better name, the test breaks. The actual behavior though hasn't changed. The get and set functions still work. Any test that couples to the internals will break unnecessarily when we do refactors that don't change behavior. In this case, the behavior is getting and setting data.

It may provide a false sense of security

If our tests interact with our components or units in a different way than our production code does, we risk our tests verifying irrelevant scenarios. I left a bug in the implementation before. Let's examine a test that doesn't capture this bug.

def test_data_set_in_cache():
    cache_client = CacheClient()

    key = "my-key"
    value = "value"

    namespaced_key = cache_client._generate_namespaced_key(key=key)
    # this test calls cache_client._set_data(key=namespaced_key, value=value)
    # but our implementation calls cache_client._set_data(key=key, value=value)
    cache_client._set_data(namespaced_key, value)

    # this test passes
    assert cache_client.get(key=key) == value

In the implementation earlier, the set(..) method should have called _set_data(...) with the namespaced key, but it didn't. This test doesn't capture that omission because it calls the internal method directly and does the right thing. Such a test creates a false sense of security because it passes even though our implementation is actually incorrect.

Negatively impacts productivity

Building on the previous points, testing implementation details also decreases our overall productivity. When we perform refactors, tightly coupled tests require insignificant and unnecessary updates. Additionally, it may fully discourage improvements to the code.

This example is a pretty simple one, but once this pattern is used more widely across a big codebase, small changes can become pretty painful. When we have priorities to consider, and an improvement balloons due to the unwieldy nature of the dependencies, such refinements often get punted, and the refactor cost compounds with the passage of time.

Encourages breaking component boundaries

Finally, this encourages us to build leaky abstractions. When we start to break component boundaries in our tests, this pattern can spread like wildfire laterally, and up the component hierarchy. This often happens due to convenience. It's easier to write other tests by copying existing tests. However, it's tempting to utilize the same approach when testing a few components up in the hierarchy.

What do we do instead?

So we've established that it's often a bad idea to test the internals of a component. There are better reasons for breaking this rule though. For example, if we wanted to verify that we're calling the underlying cache API properly, that's a fair internal detail to test.

In the example below, let's say the CacheClient accepts a cache_http_api and in _set_data(...) we're calling a post method on the cache_http_api. It's useful to verify this "internal" detail because it's a part of the expected behavior of our component. We want to ensure that we call the cache_http_api properly. Since the real API will likely require network calls, we want to provide a mock in our tests.

Consider this example...

class CacheClient:

    def __init__(self, _cache_http_api):
        self._cache_http_api = _cache_http_api

    def set(self, key: str, value: object):
        """Sets a value in the cache"""
        namedspace_key = self._generate_namespaced_key(key)
        # Oops, we forgot to use the namespaced key
        self._set_data(key, value)  

    def _set_data(self, key, value):
        self._cache_http_api.post("/set", data={key:value})

def test_that_we_call_underlying_cache_api_properly():
    cache_http_api_mock = Mock()

    cache_client = CacheClient(_cache_http_api=cache_http_api_mock)

    key = "key"
    value = "value"
    namespaced_key = cache_client._generate_namespaced_key(key)

    cache_client.set_data(key, value)

    # assert that we called the post method properly
    cache_http_api_mock.post.assert_called_once_with(
        "set/", data={namespaced_key: value})

A better option still, would be to provide a fake implementation for the cache HTTP API that replicates the real-world behavior without performing network calls. That would allow us to truly test the behavior without peeking inside the internals.

Instead of testing the internals of the method, just test the behavior using the publicly available methods instead. Such a test would look something like this.

def test_that_we_call_underlying_cache_api_properly():
    fake_cache_http_api = FakeCacheHttpAPI()
    cache_client = CacheClient(_cache_http_api=fake_cache_http_api)

    key = "key"
    value = "value"

    cache_client.set_data(key, value)

    assert cache_client.get_data(key) == value

Conclusion

It's usually not a great idea to test internals. It causes unnecessary breakage, slows us down, creates a false sense of security, and promotes bad architectural practices. We should try to avoid it as best we can!

I hope this guide was helpful for you! My thoughts on this topic were largely inspired by Kent's great article on the same topic.

Happy coding!