How to Assert Database State?

Today, we’ll discuss a question that relates to my Unit Testing book: how to assert the state of the database?

1. Observable behavior vs implementation details

Before we dive into the question itself, I need to make a preface. Otherwise, it wouldn’t be clear what the question is about.

In the book, I made an emphasis on testing observable behavior as opposed to implementation details.

  • Testing observable behavior leads to a high quality, robust test suite — The resulting tests check only those aspects of the system that matter to the end user, and fail only when those aspects change.

  • Testing implementation details, on the other hand, leads to brittle tests — The resulting tests couple to those implementation details and turn red (fail) each time you refactor the code under test.

This is one of the most important points in the book, and it ties to the 2nd component of a good test — resilience to refactoring. Here are all 4 of them:

  • Protection against regressions — How good the test is at finding bugs.

  • Resilience to refactoring — How good the test is at not producing false alarms.

  • Fast feedback — How fast the test is.

  • Maintainability — How small the test is and how many out-of-process dependencies it reaches out to.

These four components, when multiplied together, determine the value of a test.

And by multiplied, I mean in a mathematical sense; that is, if a test gets zero in one of the components, its value turns to zero as well:

Value estimate = [0..1] * [0..1] * [0..1] * [0..1]

By targeting your tests at implementation details, you reduce the 2nd component (resilience to refactoring) to almost zero. Therefore, the overall value of the test reduces to zero with it.

Another way to think of resilience to refactoring is as part of the test accuracy formula:

Test accuracy as the signal-to-noise ratio
Test accuracy as the signal-to-noise ratio

Both the signal (the tests' ability to find bugs) and noise (the tests' ability not to raise false alarms) are critically important. There’s no use for a test that isn’t capable of finding any bugs, even if it doesn’t raise false alarms. Similarly, the test’s accuracy goes to zero when it generates a lot of noise, even if it’s capable of finding all the bugs in code. These findings are simply lost in the sea of irrelevant information.

The only way to produce robust, non-brittle tests is to make them target observable behavior, not implementation details.

In the book, I gave a formal (and more elaborate) definition of observable behavior and implementation details, as well as the differences between them. Here’s a summary of those differences.

Imagine the following graph of dependencies in code:

Observable behavior vs implementation details
A map that shows communications among code components

To be part of the observable behavior, the method must meet one of the following two criteria:

  • Have an immediate connection to one of the client’s goals

  • Incur a side effect in an out-of-process dependency that is visible to external applications

The controller’s ChangeEmail() method is part of its observable behavior, and so is the call it makes to the message bus. The first method is the entry point for the external client, thereby meeting the first criterion. The call to the bus sends messages to external applications, thereby meeting the second criterion.

You should verify both of these method calls. However, the subsequent call from the controller to User doesn’t have an immediate connection to the goals of the external client. That client doesn’t care how the controller decides to implement the change of email as long as the final state of the system is correct and the call to the message bus is in place. Therefore, you shouldn’t verify calls that the controller makes to User when testing that controller’s behavior.

When you step one level down the call stack, you get a similar situation. Now it’s the controller who is the client, and the ChangeEmail method in User has an immediate connection to that client’s goal of changing the user email and thus should be tested.

But the subsequent calls from User to Company are implementation details from the controller’s point of view. Therefore, the test that covers the ChangeEmail method in User shouldn’t verify what methods User calls on Company.

The same line of reasoning applies when you step one more level down and test the two methods in Company from User's point of view.

Think of the observable behavior and implementation details as onion layers. Test each layer from the outer layer’s point of view, and disregard how that layer talks to the underlying layers. As you peel these layers one by one, you switch perspective: what previously was an implementation detail now becomes an observable behavior, which you then cover with another set of tests.

2. How to assert database state?

Alright, that was a lengthy introduction, but I had to do this in order to describe the question we are going to discuss in this article.

I received 3 different versions of this question, here’s an overall compilation of all 3 of them:

When testing a controller in integration tests, how should you check the state of the database after the action under test is completed?

Should you use:

  • An already existing repository class from the production code,

  • A custom database querying logic, or

  • A separate API endpoint?

To illustrate these 3 options, let’s take an example. Let’s say we have this User domain class (I’ve left setters public for brevity):

public class User
{
    public string Name { get; set; }
    public UserStatus Status { get; set; }
}

public enum UserStatus
{
    Active = 1,
    Deleted = 2
}

And we also have this controller with a single Register method:

public class UserController
{
    private readonly UserRepository _userRepository;

    public UserController(UserRepository userRepository)
    {
        _userRepository = userRepository;
    }

    /* This is the method under test */
    public void Register(string name)
    {
        var user = new User { Name = name };
        _userRepository.Save(user);
    }
}

public class UserRepository
{
    private readonly DbContext _dbContext;

    public UserRepository(DbContext dbContext)
    {
        _dbContext = dbContext;
    }

    public User[] GetAll()
    {
        return _dbContext.Users
            .Where(x => x.Status == UserStatus.Active)
            .ToArray();
    }

    public void Save(User user)
    {
        /* Logic to save the user*/
    }
}

Here’s the earlier image adjusted for this particular code sample:

Observable behavior vs implementation details
The dependency graph for the UserController

How should we test the Register method in the UserController?

2.1. Mocking

Let’s first address the elephant in the room: mocking. We should not use mocks to check how UserController interacts with UserRepository.

As I mentioned in the previous section, we should only test the controller’s observable behavior, as it’s perceived by its client. The client here is the external client, for whom the controller is the entry point to our system. For that client, the only method that is part of the observable behavior is the Register method itself, because it’s the method the client invokes to achieve its goal — register a user. All subsequent method calls are implementation details from that client’s perspective. The client doesn’t care about those interactions; the only thing it cares about is the final state of our system after the registration is completed.

So the 2 things our integration test should check are:

  • The Register method in the controller

  • The state of the system

Our integration test should not check how the controller interacts with the repository. Such a check would couple the test to implementation details and make it brittle as a result.

You can read more about mocking in this article: When to Mock.

2.2. Querying the database

The integration test will check the Register method by invoking it in the Act section, there’s no need to mock it. As for the state, we have these 3 options (I’m copying them from the question above):

Should you use:

  • An already existing repository class from the production code,

  • A custom database querying logic, or

  • A separate API endpoint?

Let’s discuss the first 2 options. Here’s how they might look:

[Fact]
public void Registering_a_user_option1()
{
    /* Act */
    InvokeRegister("name"); // Creates UserController and calls Register()

    /* Assert */
    using (DbContext dbContext = new DbContext())
    {
        User[] users = new UserRepository(dbContext).GetAll();
        users.Should().HaveCount(1);
        users[0].Name.Should().Be("name");
    }
}

[Fact]
public void Registering_a_user_option2()
{
    /* Act */
    InvokeRegister("name"); // Creates UserController and calls Register()

    /* Assert */
    using (DbContext dbContext = new DbContext())
    {
        User[] users = dbContext.Users.Where(x => x.Status == UserStatus.Active).ToArray();
        users.Should().HaveCount(1);
        users[0].Name.Should().Be("name");
    }
}

In option 1, we are using the already existing UserRepository, whereas in option 2, we are re-implementing the querying logic in the test itself.

There’s not much difference between these 2 options. You would think that option 1 is reusing the application code (UserRepository) whereas option 2 doesn’t, but it’s not entirely true. Aside from the querying logic (the filtering of active users), there’s also the mapping logic that both tests implicitly use: the mapping from the database to domain classes, implemented by the ORM.

To completely decouple the test’s assertions from any pre-existing logic (be it querying or anything else), we’d need to not use the DbContext or User classes at all. We’d have to manually materialize the data into custom-built DTOs, bypassing both the ORM mappings and the repository.

There’s no need to bypass the ORM mappings or the repository logic, though. Just like when verifying the user.Name property you don’t use a roundabout way such as reflection, you shouldn’t work around the repository methods either. Similar to user.Name, the repository is a window into the state of the system. The only difference is that this state is out-of-process.

So, when choosing between these 2 options (an already existing repository class from the production code or a custom database querying logic), choose the first one.

2.3. Using a separate API endpoint

At this point, a fair question to ask is: why the test is checking the state of the system by looking into the database directly?

If the test should examine the controller from the client’s perspective (which it should), doesn’t it need to use the public API available to that client? Something like another controller method that would return a single user or even a list of all users?

This is a great question.

And yes, ideally, the integration test should check the state of the system using another controller method, not by peering into the database directly.

This is akin to checking the state of a class. For example, here:

public class Customer
{
    private CustomerStatus _status = CustomerStatus.Regular;

    public void Promote()
    {
        _status = CustomerStatus.Preferred;
    }

    public decimal GetDiscount()
    {
        return _status == CustomerStatus.Preferred ? 0.05m : 0m;
    }
}

public enum CustomerStatus
{
    Regular,
    Preferred
}

How would you test the Promote method?

This method’s side effect is a change of the _status field, but the field itself is private and thus not available in tests. A tempting solution would be to make this field public, but that is an anti-pattern.

As we discussed previously, your tests should interact with the system under test exactly the same way as this system’s clients and shouldn’t have any special privileges. In the above example, the _status field is hidden from the production code and thus is not part of the class’s observable behavior. Exposing that field would result in coupling your tests to implementation details.

Instead of checking that field directly, you need to look at how the clients use this class. In this particular example, the clients don’t care about the customer’s status; otherwise, that field would be public. The only information the clients do care about is the discount the customer gets after the promotion.

And so that’s what we need to verify in tests:

  • First, invoke the Promote method,

  • Then, check the current discount using GetDiscount().

The same guidelines apply to tests that cover the controller. We need to step one level of abstraction up and look at how the controller’s clients interact with the updated state of the system. They do that by invoking other controller methods, not by using repositories.

2.4. Which approach to choose?

So, the ideal approach is to use a separate API endpoint when checking the state of the system.

There’s a problem with this approach, however: you don’t always have an API endpoint that you can call to check all the components of the modified system state. In fact, more often than not, you don’t have such an API.

Even in our simple example, we might have a GetUser controller method that would return all the information about the user, but we might not have a GetAllUsers method. So, we would be able to check the user’s state, but not that there’s only 1 user (and not 2 or 3) in the database after the registration.

Usually, no single API endpoint returns all the information about the changes made by another API endpoint. You’d have to call several such APIs and aggregate that information manually. Moreover, the system might react to changes not immediately, but after some internal background processes are completed.

And so, even though the approach with a separate API endpoint is the ideal one, it is not practical.

The next best thing is to step 1 level of abstraction down and see how the controller itself receives information about the system state. It does this by using the UserRepository and the User domain class. Therefore, use the first approach when checking the database state. Here are all 3 again, so that you don’t have to scroll:

Should you use:

  • An already existing repository class from the production code,

  • A custom database querying logic, or

  • A separate API endpoint?

Here’s the test that follows the first approach:

[Fact]
public void Registering_a_user()
{
    InvokeRegister("name"); // Creates UserController and calls Register()

    using (DbContext dbContext = new DbContext())
    {
        User[] users = new UserRepository(dbContext).GetAll();
        users.Should().HaveCount(1);
        users[0].Name.Should().Be("name");
    }
}

Again, this is not ideal, but it’s still pretty close. The reason it’s close to ideal is that your domain classes are already encapsulated (at least I hope they are) and don’t expose private state and other implementation details. Therefore, those classes are aligned with what your overall system exposes state-wise.

In other words, the state of your domain classes is already a combination of all the information exposed by your controllers' public API.

3. Summary

  • There are 3 ways to check the database state in integration tests. You can use:

    • An already existing repository class from the production code

    • A custom database querying logic

    • A separate API endpoint

  • Options 1 and 2 don’t differ in principle (if you use domain classes in option 2)

  • The separate API endpoint (option 3) is the ideal but impractical solution

  • The next best thing is option 1 where you use already existing repositories and domain classes

Subscribe


I don't post everything on my blog. Don't miss smaller tips and updates. Sign up to my mailing list below.

Comments


comments powered by Disqus