A Way to Effectiveness: Release Quality, Volume and Speed in 1 QA Metric

glebsarkisov

Published in

Mayflower team

8 min readMar 14, 2023

Hello everyone, my name is Gleb. I am Head of the QA department in Mayflower.

During the last few years, I got obsessed with metrics that help to:

Find issues in QA/delivery processes
Get better in negotiations with business owners
Show real benefits of QA for the project
Measure KPIs

While working in various companies I looked for different approaches to solve the tasks above. The defect density metric interested me the most. As a result of my research, I modified this metric and created my own defect density “with a twist”.

If you are looking for a metric which combines level of quality, release volume, and release cycle speed in just one number, this article is for you.

The problems of classic defect density

Classic defect density is the number of defects per 1,000 lines of code in a software module or the whole product during an iteration or release. The goal is to find your number of defects to the size of the code ratio and lower it afterward. Sounds good in theory, though there are some issues with rolling out the metric and getting comfortable with using it.

The problems with classic defect density:

Let’s say your project technology stack contains multiple programming languages, various modules, separate services — it would be hard to come up with a way of calculating the metric for all of them.
Some might find it hard to read or understand that specific ratio, while others would not be able to have access to code but still want to know the level of quality.

The solution

What if we take only the idea the metric offers and modify it to the point it is easier and more informative to work with?

A recipe for defect density “with a twist”: it’s the ratio of defects of different priorities in production to the actual user benefit the team delivered during the sprint.

We can take into account both the “cleanliness” of testing during the sprint and the delivery speed through the delivered volume of tasks and bug fixes.

This indicator can be explained to the business, and it can be used as a measure of the quality of the release process — both at the level of the team and at the level of the entire product.

Calculating defect density “with a twist”

Defect density “with a twist” calculation:

D — defect density “with a twist”

d — defect of the corresponding priority p1, p2, pn

k — coefficient for the corresponding priority p1, p2, pn

t — ticket (task/bugfix) of the corresponding level of complexity c1, c2, cn

h — coefficient for the corresponding level of complexity c1, c2, cn

Numerator — production defects weight in sprint, denominator — all delivered tickets weight in sprint. The lower the number, the better the sprint.

Disclaimer: the approach in the article describes coefficients which I picked for myself — I suggest you define proper coefficients and number of levels of complexity specifically for your case.

Recommendations on numerator

Taking into account different levels of defects priority, we will introduce concept of defect priority coefficient:

We are dividing defects into 5 priority types based on which we use (you might have more or fewer types in your priority approach):

p1 for critical
p2 for major
p3 for medium
p4 for minor
p5 for trivial

Critical defects are multiplied by 5, so that these defects feel significant; we are also giving corresponding coefficients k to other types of defects (again, these numbers can be different in your case — the main point here is to add more value to what is important and not to consider different types of defects equal):

kp1 = 5
kp2 = 2
kp3 = 1
kp4 = 0.5
kp5 = 0.1

As a result, our numerator is a sophisticated sum of all priorities production defects during the sprint — everything we were unable to prevent while testing the product.

Recommendations on denominator

You might want to apply the same “different priority with values” logic for tasks and bug fixes released during the sprint. Here are a few reasons not to do it.

Don’t confuse priority with severity for product tasks. While calculating the amount of tasks of some priority the final number will seriously depend on a product manager’s decision. For example, a task can be marked with critical priority simply because PMs want to have it done asap. Yeah, that sounds unprofessional to combine severity and priority but let’s face it — this is a very common thing in IT companies.

What is more important, we want to count all the effort invested into development and testing in the same formula. We spend different time on working on different tickets — that’s what I call levels of complexity c in the denominator. For instance, tickets which cost us more effort we can mark as Extreme complexity level, cheaper tickets would be Huge, Medium, etc. How to define these levels of complexity?

Let’s look at all delivered tickets throughout recent 4 sprints — this will be tasks, bug fixes, etc. Slice them up into 5 percentile layers based on how much time is logged into a ticket by developers and QA:

100p — Extreme complexity, c1
95p — Huge complexity, c2
90p — Long complexity, c3
75p — Normal complexity, c4
50p — Quick complexity, c5

Note: You can take as much sprints data as you want and as much percentile layers as you need for more levels of complexity

Once you find percentiles for each sprint out of 4, calculate arithmetic mean for each percentile layer for 4 sprints. This way, you will get limits for every complexity level.

We also want to establish values of tickets delivered to production by multiplying the amount of tickets of a certain complexity level by the coefficient of that complexity level h. You can define these coefficients by yourself, for example:

hc1 = 8
hc2 = 5
hc3 = 3
hc4 = 2
hc5 = 1

Again, it is ok that your coefficients here might come “empirically”: they will still let us get ranged tickets.

Our denominator sums up the value of all delivered tickets during the sprint — every bit of user benefit.

Now that we have the values and we’ve calculated our defect density “with a twist”, let’s talk about how to understand and treat that number.

How to work with the metric

Working with the metric might look obvious to some. You will need to collect data, find your current state of things, change processes so that you have a better value of the metric. I will try to decompose the process and define its stages.

Finding current state of things value

We need to collect a significant amount of data — at least 10 sprints analysis should work. It goes without saying, the more sprints you analyze, the better. Note, it is not necessary to wait for 10 sprints — you can definitely look into closed sprints if those tickets have some sort of logged-in time by devs and QA.

Finding control limits and focus area

As soon as you have at least 10 values for the metric, it makes sense:

To define the upper control limit, so that if the metric crosses the limit you know your delivery process does not feel good. This can be done by 100 percentile of your collected metric values or by adding standard deviation to 100 percentile;
To calculate 95 and 90 percentiles for your metric values range to define the levels which are your targets for the delivery process (the lower the percentile target, the more ambitious the goal).
I will mention potential usage of the target as some sort of KPI for a QA lead / whole development team.

Metrics cannot help you until you commit and change your processes accordingly.

Analysis during the sprint

Critical priorities’ defects will be the top value for your numerator. Bad team’s performance (the number of logged-in hours in the tickets) will definitely worsen your metric.

In such cases, it makes sense to analyze the reasons behind you slipping on critical issues or stabilize the team’s performance by hiring more engineers or optimizing your requirements. Сome up with a hypothesis for the origin of the problem, change the process, monitor the outcome — you know what to do.

Some things to note while implementing the metric

You need to instruct your devs and QA on how to properly log the work time into the tickets in case they are not doing that already. Make sure to explain to your team that you need metrics to have a measurable result, not to look for the guilty ones.
Ensure the tickets go through your workflow and change statuses on time. You do not want to have a ticket which for example was delivered to production last sprint, but was not caught by your sprint query just because it was not transitioned to delivered / closed status.

Conclusion

When working with metrics for me it was always hard to see direct connection between one part of a process and another. When other managers were telling me that my QAs are testing too slow, I used to show the amount of bugs we find before rolling out to production — I thought that this is very self-explanatory from the point of quality. At the same time I really wanted to know how stable the pace of releases is, whether or not we deliver exactly that amount of new features and of that quality.

Defect density “with a twist” lets me see everything combined in one value — the release volume, the quality and the speed. No single metric is a solution to all of your problems, but whatever might help you identify them and monitor the changes applied comes in handy.

Few words to think over:

Automate metrics collection, so that there is no human error
Make the metric transparent to every member in the team by visualizing it on a dashboard
If you are in charge of team’s goals, that metric can be used in subset of KPIs for the quarter/period
Experiment with other metrics also — defect leakage, prod/test bugs ratio, etc. — whatever helps you make the state of things visible to everyone, define issues faster and fix them

P.S. Kudos to Rita Kind-Envy for editing!