Keeping tests valuable: Avoiding Leaking Domain Knowledge to Tests

Keeping tests valuable: Avoiding Leaking Domain Knowledge to Tests

Leaking domain knowledge for testing may not seem harmful at first, but with increasing complexity it becomes a big problem!

One of the key principles of effective software testing is to ensure that testing remains valuable and sustainable over time. This permits developers to verify specific code units and ensure they comply with the specifications. However, in the process of writing tests, it is easy to accidentally domain leak knowledge into the tests. We'll explore this topic in greater depth by answering the following questions:

  • Is leaking domain knowledge an anti-pattern?

  • Do my tests reveal domain knowledge?

  • What negative impact could this have on the test suite?

  • What are the benefits of preventing the exposure of domain knowledge in unit tests?

I hope this post helps you! Let's get started and understand why revealing domain knowledge is an anti-pattern.

๐Ÿ“Œ Is leaking domain knowledge an anti-pattern?

First, it would be nice to understand the term. Antipatterns are improper programming practices that are considered unfavorable and opposed to design patterns. Design patterns are common solutions to frequent problems that have been established and widely accepted as good development practices. Here are some of the many that have already been rated:

  • God Object: A class that knows and does too much, violating the Single Responsibility Principle.

  • Spaghetti Code: Code that is tangled and hard to understand, making it difficult to maintain and modify.

  • Magic Numbers: Using hardcoded values instead of named constants or variables, making the code less readable and more error-prone.

  • Primitive Obsession: Overuse of primitive data types instead of creating domain-specific objects or classes, leading to complex and hard-to-maintain code.

There are dozens I could list, but these are enough to understand that we want to avoid anti-patterns within our code base as much as possible, but we also don't want to bring in anything detrimental to our valuable tests. So we need to be alert. Leaking domain knowledge into tests goes against the good practice because it causes side effects in testing and can even cause false positives in testing. So yes, we can consider it an anti-pattern. Now let's understand a crucial difference between two subjects that are not the same but are related to testing.

๐Ÿ“Œ Leaking Domain knowledge X Implementation Details

It is important to understand the differences, leaking domain knowledge in unit tests is not the same as exposing implementation details, they are not the same subject, although they are related.

Domain knowledge leakage refers to revealing the business logic for which an application was built. This can happen when a test case makes assumptions about the behavior of the system based on its internal workings, rather than its intended functionality. For example, when tests include values or assumptions about the domain.

Implementation details, on the other hand, refer to revealing or exposing specific details about how the system is built. When a test case makes assumptions about the internal structure or behavior of the system, rather than focusing on its API or public contract. So it is usually seen that tests that check implementation details are often concerned with the number of times a particular method is called, properties irrelevant to the test, or the number of foreach interactions.

Also if a unit test depends on a specific private method or variable name, any change in the implementation of the code being tested can break the test. And this makes the test unstable and costly to maintain because it breaks easily. Not clear? Check out this analogy to further clarify the difference ๐Ÿ‘‡:

Imagine you are a user of a food delivery service and you want to place an order. As a user, you know that you need to choose the restaurant, select the menu items, enter the delivery address, pay for the order, and finally wait for the delivery. This information represents your domain knowledge - you know how to use the food delivery service to get your meal.

But now look at the cook in the restaurant that provides the food for the delivery service. He knows the details of how to prepare the dishes, season them correctly, cook them at the right temperature, etc. This information is implementation details - the chef knows how to do his job.

Let's bring this into the context of unit testing, leaking domain knowledge would be like a user trying to cook his food in a restaurant, without knowing how to choose the right ingredients or use the cooking equipment properly. This would be inefficient and it is not the user's role to do the work of the cook and the chef. But implementation details would be like the cook sharing technical information about food preparation with the user, such as the ideal oven temperature or the brand of knife used to cut the vegetables. This information is irrelevant and unnecessary for the user, who just needs to know how to order and receive his food correctly.

The unit tests don't need to know how many error messages we have in the domain, and they don't need to know BussinesExceptions from the domain or other particularities in the domain. With the implementation details, we don't need the tests to know the internal implementation of a method, or even directly test private methods. But tests should only care about input and output, they should be set up correctly to test units of code correctly to cover the expected behaviors.

Although different, these concepts can be connected, since leaking implementation details can lead to the exposure of domain knowledge. When a test assumes things about the inner workings of a system, it can in turn expose assumptions about how the system should be used, which is connected to domain knowledge. Differences can be difficult to visualize, as a test can be leaking knowledge and implementation details at the same time, and as mentioned are related issues. If you want to see examples of tests that are concerned with implementation details, you can check out a post I wrote on the theme, Keeping tests valuable: Avoid implementation details!

Remember, unit tests are meant to verify units of behavior, so what we always want to know is whether the result complies with the expected behavior of that feature under the test. If you leak implementation details the tests get inaccurate and always need unnecessary tweaks, if you leak domain knowledge, the test can often break and be useless, be a false positive.

A false positive is a false alarm. It occurs when a test fails even though the tested functionality is working correctly, i.e. the test indicates a problem that does not exist. This can lead to a waste of time and resources, as developers need to investigate and fix these non-existent "problems".

Now let's move on and see examples in practice on the main topic of this article.

๐Ÿ“Œ Do my tests reveal domain knowledge?

Let's explore this now with a more practical example. Look at this unit test that seeks to check the behavior that invalidates a Voucher already used in the platform:

    [Fact(DisplayName = "Validate Voucher Used")]
    [Trait("Category", "Sales - Voucher")]
    public void Voucher_ValidateVoucherUsedMustBeInvalid()
    {
        var voucher = new Voucher(code: "PROMO-15-OFF",
            discountPercentage: 15,
            discountValue: 150,
            quantity: 1,
            typeDiscountVoucher: TypeDiscountVoucher.Percentage,
            expirationDate: DateTime.Now.AddDays(10),
            active: true
            used: true);

        // Act
        var result = voucher.ValidateApply();

        // Assert
        Assert.False(result.IsValid);
        Assert.Equal("This voucher has already been used.", VoucherApplicableValidation.UsedErrorMsg);
    }

See that in the assert section, we have an error message exposed, plus an Assert.Equal. Why does this test leak domain knowledge? Well, the answer is that if in the future the message is changed, the test will surely fail! If this error message is changed at some point, even if the functionality of the system remains the same, the test will fail because it expects a specific string literal.

Therefore, it is more advisable to test the behavior of the system rather than specific error messages. The test should only check whether the behavior of a voucher that has already been used is now invalid in the system. The expected result is that result.IsValid returns false. But perhaps for another test scenario that seeks to verify that the error message was returned in the result, we could use something like this:

public void When_VoucherIsInactive_Expect_VoucherIsNoLongerValid()
    {
        var voucher = new Voucher(code: "PROMO-15-OFF",
            discountPercentage: 15,
            discountValue: 150,
            quantity: 1,
            typeDiscountVoucher: TypeDiscountVoucher.Percentage,
            expirationDate: DateTime.Now.AddDays(10),
            active: false
            used: false);

        // Act
        var result = voucher.ValidateApply();

        // Assert
        Assert.False(result.IsValid);
        Assert.NotEmpty(result.Errors);
    }

We just reuse the return from the Act result. Then we check to see if the result contains the error. The call Assert.NotEmpty(result.Errors) by itself does not leak domain knowledge, because the term "Errors" is generic and does not indicate anything specific about the application domain. In this specific case, we use a FluentValidation library, to method ValidateApply(). You may not like this kind of library, but I see it as an enabler in many cases.

If you need to explain the reason for such an assertion in the test, you can use other techniques and libraries, such as FluentAssertions:

    // Assert
    result.IsValid.Should().BeFalse();
    result.Errors.Should().NotBeEmpty();
  // or maybe
       // Assert
    result.IsValid.Should().BeFalse("because the voucher has already been used");
    result.Errors.Should().NotBeEmpty("because there is at least one error");

Let's go to another example. Another way we have an unstable test that leaks domain knowledge would be the one below:

     [Fact(DisplayName = "Validate Voucher Type Invalid Value")]
     [Trait("Category", "Sales - Voucher")]
     public void Voucher_ValidateVoucherTypeValue_MustBeInvalid()
     {
         // Arrange
         var voucher = new Voucher(code: "",
             percentDiscount: null,
             discountValue: null,
             quantity: 0,
             typeDiscountVoucher: TypeDiscountVoucher.Value,
             dateValidity: DateTime.Now.AddDays(-1),
             active: false,
             used: true);

         // act
         var result = voucher.ValidateIfApplicable();

         // Assert
         Assert.False(result.IsValid);
         Assert.Equal(6, result.Errors.Count);
     }

The test leaks domain knowledge when it uses Assert.Equal(6, result.Errors.Count) because it specifies the exact number of errors expected in the result. This means that the test is aware of the precise amount of errors that will be returned by the method, which may not be desirable. Furthermore, this information may change in the future. It may be that validation is no longer needed and dropped from the class causing the test to fail.

Cool, we explored several examples! But there is another example that shows exactly that, Vladimir Khorikov used it in his book, Unit Testing Principles, Practices, and Patterns. See the test below:

public class CalculatorTests
{
    [Fact]
    public void Adding_two_numbers()
    {
        int value1 = 1;
        int value2 = 3;
        int expected = value1 + value2;

        int actual = Calculator.Add(value1, value2);

        Assert.Equal(expected, actual);
    }
}

What the test does is duplicate Calculator internals. The expected should not sum the input values. This should be the responsibility of the method under test. The challenge with this approach is that these tests do not provide any meaningful verification. The tests simply duplicate the contents of the code under the test.

The solution would be to pre-calculate and parameterize this test, as follows:

public class CalculatorTests
{
    [Theory]
    [InlineData(1, 3, 4)]
    [InlineData(11, 33, 44)]
    [InlineData(100, 500, 600)]
    public void Add_two_numbers(int value1, int value2, int expected)
    {
        int actual = Calculator.Add(value1, value2);
        Assert.Equal(expected, actual);
    }
}

With the pre-calculated and parameterized test, we see that the test has a very clear goal, to verify the behavior of the sum, the input and output are the focus of this test!

If none of these examples have made it clear yet how important it is to avoid leaking domain knowledge, let's move on to another.

Imagine we are working on the development of an online sales management system for a clothing store. One of the main functionalities of the system is to allow customers to add products to the shopping cart. We know that you can only add products to a shopping cart that are in stock. Ok, we can even imagine a test scenario similar to the one described below:

// NUnit
    [Test]
    public void AddItemToCart_WhenItemIsOutOfStock_ShouldReturnFalse()
    {
        // Arrange
        var item = new Item { Name = "Blue Shirt", Price = 29.99, StockQuantity = 0 };
        var cart = new ShoppingCart();

        // Act
        TestDelegate action = () => cart.AddItem(item);

        // Assert
 Assert.IsFalse(action);
Assert.Throws<OutOfStockException>(action); //unnecessary
    }

See that in the assert we once again have a domain leak, Assert.Throws(action), if this class that generates an exception ever changes its name or doesn't throw exceptions anymore, the test will fail. The problem here is that we don't need to guarantee that this exception will be thrown, but rather check the final output that matters. The behavior is to prevent a user from adding an item to the shopping cart with zero stock. The test could easily be written in this way:

// xUnit
    [Fact]
    public void AddItemToCart_WhenItemIsOutOfStock_ShouldReturnFalse()
    {
        // Arrange
        var item = new Item { Name = "Blue Shirt", Price = 29.99, StockQuantity = 0 };
        var cart = new ShoppingCart();

        // Act
        var result = cart.AddItem(item);

        // Assert
        Assert.IsFalse(result);
    }
}

The test does not throw an exception, this allows the test to focus on the expected behavior of the system, rather than relying on domain-specific knowledge. I'm not saying that it is wrong to test whether the code meets this exception-throwing behavior, we can even create a specific test to cover this behavior, but the focus here is that if the stock is zero, then it is expected to return false, validating that the user will not be able to add to the cart an item that is not in stock.

In summary, the point I want to get across with the example above is to avoid exposing custom exceptions in tests, unless they are generic to the entire application. Testing for exceptions that may occur is a deep, but very important, subject! When we avoid placing unnecessary asserts, we increase the focus on the behavior and decrease the chances of this test breaking because of unnecessary knowledge.

๐Ÿ“Œ What is the negative impact of leaking domain knowledge for testing?

Leaking domain knowledge to the tests can lead to several problems, we can have tests that fail at any domain change. But I will list here some points that I have seen:

  • Increased fragility: if domain knowledge is exposed in the test code, tests can be stopped even if the functionality of the code being tested remains the same.

  • Increased coupling: changes to the implementation may require changes to the tests, which can result in tight coupling between the test code and the implementation.

  • Maintainability: It can be difficult to maintain the tests as the implementation changes over time.

  • Reduced readability: Tests over time become difficult to read and understand, making it harder to maintain them over time.

  • Reduced test coverage: When domain knowledge is leaked to tests, it may become more difficult to test edge cases and unusual scenarios, as the tests may not cover all possible cases that are relevant to the domain logic.

๐Ÿ“Œ What are the benefits of preventing the exposure of domain knowledge in unit tests?

Preventing the leakage of domain knowledge to unit tests can bring several advantages, such as:

  • Better maintainability: Tests that are less prone to breaking when domain logic changes, reducing the effort required for test maintenance.

  • Increased flexibility: Tests that are independent of domain knowledge are more flexible, allowing them to be refactored and modified without fear of breaking the tests.

  • Improved readability and comprehensibility: Tests that focus on verifying the behavior of the code, rather than depending on specific domain knowledge, are more understandable and readable to people who are not domain experts. This leads to improved comprehension of the code and better collaboration.

  • Enhanced test coverage: When tests are not closely related to domain knowledge, it becomes easier to test edge cases and unusual scenarios, resulting in better test coverage.

๐Ÿ“Œ Conclusion

Leaking domain knowledge can make your tests unstable and bring distrust about them. We don't need that, we want tests that verify the behavior of the system stably, so the steps I have listed can help you build a better quality suite when writing your unit tests! This goes for the front end as well.

I hope you enjoyed reading this, leave a like if you can and any constructive criticism in the comments. Feel free to contact me on LinkedIn. As the official Spock would say: Live long and prosper ๐Ÿ–๏ธ

References:

Leaking domain knowledge to tests - Vladimir Khorikov

Effective Software Testing: A Developer's Guide - Mauricio Aniche

Unit Testing Principles, Practices, and Patterns - Vladimir Khorikov

ย