Keeping tests valuable: Are the code coverage metrics reliable?

Over the years, the quality of the code base and the quality and coverage metrics for testing are a constant topic in the Software Engineering community. We continue to try to find a balance. Many projects put in place specific metrics of various types. Let's try to dive into the questions below:

Should we set rigid metrics?
Does code coverage indicate quality?
How can we use coverage metrics to our advantage?

Goodhart's Law

Do you know Goodhart's law? The British economist Charles Goodhart quoted:

" When a metric becomes a goal, it stops being a good metric."

But can turning metrics into a goal in the context of software engineering, specifically for code coverage can be harmful? The answer is yes. Take a look at this scenario.

The development team works together to accomplish a delivery for a big financial industry client. The enterprise application under development seeks to streamline critical processes that are internal to the company. The quality team, together with the management team, determines a minimum code coverage percentage of 97% for the project to pass through the pipeline. It is imagined that the greater the number of lines covered, the lower the chance of bugs occurring. So the main metric used for this project is this. After a few months, many features go back with bugs. The management and quality team starts to wonder what happened, because the metrics look ok, but the functionalities always have one bug or another that affects the deadlines. What is the real problem?

Instead of devoting time and effort to improving the quality of the tests, and seeking to test behaviors, the team seeks only to cover code! The development team was only concerned with testing to achieve metrics desired by the project managers. This is especially a problem with large enterprise projects. In the absence of a results-based goal, aiming for code coverage with high percentages often only camouflages possible hidden bugs. This scenario tends to get worse when there is poor management, poor documentation, a lack of understanding of the software domain, and multiple squads working together. What I have described is the ideal scenario for financial losses and mainly customer dissatisfaction, whether end-users or corporations.

I have worked on projects where nobody clearly understood how to elaborate test scenarios for each functionality. So most of the time the unit tests were done with the following line of reasoning:

I think this is what we need to test!

Or worse:

I believe this feature doesn't need that many tests, let's just cover the lines.

Covering only the happy path of code is not always a good idea! We must understand well what we are testing and analyze its impact on the business. Ok, but what is the lesson? Defining metrics as goals does not guarantee quality! Especially in the context of software engineering. Also, we have to understand, COVERAGE IS AN RESULT, NOT A GOAL.

So if you are a software project manager, technical lead, senior, CTO, CEO... Here is a suggestion. Quality needs to be based on solid results! Code coverage is the result of testing, it is a report to assist the software development cycle. Usually, results are features without bugs, that always get approved for production and don't come back. Productivity also increases and the teams have more security to refactor and create new functionalities in the software because the tests are reliable and not metric-driven.

Is determining coverage percentages really important?

This is a question that we can have several opinions on. Many projects that I worked on had percentages that had to be met. Some were quite similar, some clients required 80 percent coverage in the whole project, and others determined percentages above 90 percent in each class. Still, others were more specific and detailed, for example, domain classes with business rules needed to have 100 percent coverage per center. What have I learned and continue to learn from these code coverage metrics? They are needed for each type of business. Applying rigid metrics to be achieved can lead to all the things we commented on in the previous topic. But having realistic metrics that match the needs and requirements of each client helps a lot to maintain a standard in the project.

The quality in writing the test scenarios and in the tests themselves, whether unit or integration tests, is essential. But if there is no minimum coverage requirement, the developers themselves may not give much importance to writing the tests. We know that this happens, mainly because test writing is a task that many developers don't like to do, and I might say that in the past, I hated writing tests! Even more so thinking about scenarios that could occur. But every developer who wants to evolve professionally and mature needs to take testing seriously! Today I see that testing is a fundamental part of the development cycle of any software!

So, is it really important to have realistic code coverage percentages for each business? I still believe so! Developers must align with domain experts and project managers on a realistic standard to achieve. Unrealistic percentages only create discomfort and change the focus of the development team when deadlines are short.

It is good to have a high level of coverage in the main parts of your system. It is bad to make this high level a requirement. The difference is subtle but critical. - Book, Unit Testing Principles, Practices, and Patterns, by Vladimir Khorikov.

From here on I will put forward an opinion that is my own.

I see more value in percentages per file (classes) than just determining a percentage that each project must have to pass the pipelines. But as we will see later, we still need to be careful not to fall into the false metric of lines. This will be discussed further in this article, the problems with relying completely on per-line coverage.

But defining percentages to be achieved in classes or layers where the business rules reside, where those tests need to have more value can be an advantage. Defining coverage patterns in critical classes can make developers focus on better understanding the classes that contain the most business-critical functionality. Typically, domain classes tend to have more branching paths that code can take and more cyclomatic complexity. Let's understand a bit more about the branching coverage.

Branch Coverage

Instead of using the raw number of lines of code, this metric focuses on control structures, such as if and switch statements. It shows how many of these control structures are traversed by at least one test in the suite. See an example with .NET:

public class Phone
    {
        public bool isValid(string phone)
        {
            if (phone == null)
            {
                return false;
            } 

            if (!phone.All(char.IsDigit))
            {
                return false;
            }

            if (phone.Length < 11)
            {
                return false;
            }

            if (phone.Length > 13)
            {
                return false;
            }

            return true;
        }
    }

Now take a look at the generated report, I would like you to take a good look at the metrics raised just for this file of the number of branches:

This generated report indicating the branches that have not yet been covered is the most useful part. With this visual indication, it is clear to identify what is missing to be covered, in this case practically everything. Each if or switch the statement, cannot go unnoticed. If the developer forgets some detail, the report will point it out and show it. We must remember that branch coverage is part of code coverage, always think of code coverage in parts broken down into several different criteria, among which we have branch coverage. We can say that branch coverage is a subset of code coverage. It is a more specialized version of code coverage that focuses on a specific aspect - namely, making sure that every branch or path is tested.

I won't venture to say that branch coverage is better or more useful, I believe that everything should have a balance. In the end, we see that line coverage goes practically hand in hand with branch coverage. The recommendation is to always keep an eye on the branches that have not been covered in your code and try to understand why and how you could achieve this coverage of a certain branch.

The problems of relying completely on code coverage percentages!

Some people believe that after refactoring a code and the number of lines in code coverage drops that this will automatically improve our testing. This is illogical. Here is an example:

public class Phone
    {
        public bool isValid(string phone)
        {
            if (phone == null) return false;

            return phone.All(char.IsDigit) && phone.Length >= 11 && phone.Length <= 13;
        }
    }

I performed a small refactoring in the code without changing anything in the unit tests. Now let's make a comparison, before and after the refactoring on the metrics that the Coverage framework brought us:

BEFORE REFACTORING

AFTER REFACTORING

The percentage number increased, it was 46% but now we have 80%, does that mean we have an evolution or improvement? No!

We only reduced the lines of code, but the behaviors (the important ramifications) are not covered by tests! If you look at the before and after images, the percentage of branch coverage (paths that the algorithm can take) remains the same even with refactoring.

And this is one of the big problems when working with code coverage, many times if we act without thinking or analyzing carefully, we end up being fooled!

It can be very easy to manipulate coverage numbers. The more compact your code is, the better the test coverage metric will be because it only considers the raw line numbers. This is why careful analysis is needed by developers and software testers, again looking and watching out for the quality of the tests written. What does this prove? Code coverage is not related to quality. Just because your code is 100% covered doesn't mean that your tests reflect quality.

All that we have just seen further reinforces that coverage is just a tool to be used so that we can create effective and quality tests. Furthermore, a test needs to have many important attributes to be considered a quality test, for example, it needs to be readable, organized, easy to understand, without dozens of asserts and mainly focused on testing behaviors of the functionality. So we just answered the question:

Does code coverage indicate quality?

Not based on everything we have read.

This topic is very related to the next one, which is why I have reserved a section to talk about this subject. Reaching a percentage of coverage should not be a reason to release code to production. Let's understand more about this subject.

Don't use code coverage as a release criterion

In my entire career, I have seen this happen in only two projects. And why talk about this topic? The answer is simple. Relying on coverage percentages to release critical functionality into production is a big mistake! The code should go through several quality checks and a strong Code Review by the project developers and the client itself (if there is a technical team for such a function). To have only percentages of coverage as release criteria nowadays comes to be amateurish! Large corporations need robust processes. And this is a warning for those who have not yet done so. Avoid future headaches. Create a reliable and strong-quality treadmill for your software.

To reinforce the importance of never using coverage metrics to release functionality, think about the software you currently work with. What would be the effects caused by having only a percentage or metric to be achieved for such a feature release to production? In the beginning, maybe it would be easy or quick to fix the bugs caused by a lack of testing. But as the software grows, at some point it will become impossible to release new features. And the final effect is the end of the project, of the software.

Again I reinforce the importance of having Code Reviews and solid acceptance criteria for the code to be released for homologation and production. This topic is also very deep and important but can be left for another post.

How can we use code coverage metrics to our advantage?

We have talked a lot about several topics that revolve around code coverage. But how can we use this tool to our advantage without falling into line coverage traps Here I can list some suggestions:

Use this tool in areas of the code that are more complex and critical to the type of business you are working on.
If the team finds that a file has zero coverage or a very low percentage and number of branches that have not been covered, it helps to write test scenarios for behavior.
Reinforce the importance of testing code branches! Branches indicate the paths that an algorithm can take to accomplish its goal. So if your code contains many if/else structures, it is always worth looking at how the tests are built on top of these branches and if they exist.
Pay attention to the coverage report, but never assume that the tests were written with quality and cover all the behaviors of the functionality, just based on the report. As we have seen, 100% code coverage has no relation with test quality.
It is not beneficial to establish high levels of code coverage, and high percentages in the project and even in the archives if the quality of the tests continues to be insufficient.
Tools that generate coverage reports can show you the complexity of the cyclomatic code and this can help you indicate whether or not you should look into refactoring the class or methods. So this is a very positive factor. Always check to see if refactoring is possible to improve the readability of your code.
Code reviews and collaboration, and code coverage metrics can provide useful information for code reviews, helping reviewers to focus on less tested areas and identify potential risks.
Carefully review the reports and percentages shown, and never ignore other test scenarios!

In the end, each project will have its own decisions and teams will determine the standard to be followed for code coverage. The topics covered are useful tips that I have learned from other software engineering specialists and during my experiences in national and international projects. Always try to seek a point of balance.

Thanks for reading! See you next time! 😉🚀

References:
Code Coverage Complications - by Anthony Sciamanna

Effective Software Testing: A Developer's Guide - Mauricio Aniche

Unit Testing Principles, Practices, and Patterns - Vladimir Khorikov

The Pragmatic Engineer