It’s a Trap! Avoid Focusing on Vanity Metrics in Software Testing

Mark Mayo
ITNEXT
Published in
5 min readApr 24, 2021

--

Several years ago, a new PM came onboard a team I was on and introduced me to the term “Vanity Metrics”. At the time we were excitedly watching the number of downloads of our new app, and she bluntly said that didn’t matter, and that really, conversion and retention of users were far more important.

Now obviously you can’t get to conversion and retention without those initial downloads, and briefly being #1 on the app store certainly helped, but it did make me more aware of how some numbers really are ‘vanity’ and don’t contribute towards your team’s goal of shippable quality. Customers don’t care how many downloads you had in the past hour. Customers care if it does what they want.

Vanity metrics — not always a true reflection of quality

As a Quality Engineer / Test Lead / QA Lead or whatever we call ourselves these days (vanity name?), time and experience has also shown me a number of metrics within testing/quality that, when it comes down to it, are purely (or nearly purely) for vanity.

Test Coverage Percentage

In your latest software project, be it unit tests, integration or even automated UI tests — do you measure test coverage? Don’t misunderstand me, coverage is GOOD, it’s useful to write tests. However, striving for what one manager claimed to me was an industry standard of 80%, and another wanted 100% on everything is, frankly, not that ‘useful’. You can have 100% of coverage and still not find bugs. Take this code sample:

double divide (double a, double b)
{
return a / b;
}

We can write a unit test that passes 100% coverage of this. That doesn’t mean it’s bug free though. If we don’t test edge cases, such as where b is 0.0, it’s immediately evident that it may not be caught.

Total Number of Bugs

What’s the ideal number of bugs? Zero! But if you’ve worked on any software project, you (hopefully) are aware that’s next to impossible. Consider the quality triangle of time, cost and quality. Pick two. If you want high quality, and zero bugs is REALLY high quality, it’s either going to take forever or cost a fortune.

Can you spot the bug?

Now that doesn’t mean it’s not a leading indicator. If the number of bugs found is increasing over time, especially those of high severity, it’s worth investigating if they’re being found in a particularly complex module that perhaps needs further attention. Or if affecting a particularly valuable customer, it needs to be considered. Similarly if the cost risk of bugs is high (think a pace-maker or defibrillator). But at some point you need to make money, and ship software. And there will be bugs still in it. The only way to avoid more bugs is to not ship more software, and that’s not a solution.

One further risk is behavioural change. I’ve been at a company where the Project Manager literally told us to stop raising bugs until the release was out, so that it wouldn’t get delayed any more. Then he was congratulated for an on-time release, and the engineering team was chastised for the resultant bugs that were discovered…

Total Number of Tests

I’ve worked at a company where there was a team of 20 manual testers executing 4,000 test cases every release. And this was on a media set top box, the tests were NOT generally short. Rapid releases were not possible.

The “solution” was to hire a team of test automation engineers, to automate the lot.

The downside? Most of those tests were frankly, useless. Many were nested inside other tests, or were incredibly, incredibly unlikely to fail. The amount of effort invested, versus having a set of smoke tests of critical features was very arguably a misuse of time, for devices that could be updated over the net in the event there was an issue. Furthermore, there were 15–20 different variants of the boxes, but aside from screen resolution and boot up sequences, there was no huge case for running all tests on all devices. The Pareto Principle is always something to consider here.

Along those lines, say you have 100 or 1000 tests. If they’re not the right tests — testing the critical pieces, or the pieces most likely to break, they’re far less useful than 5–10 smoke tests that test critical functionality.

Test Pass Rate

Like the total number of tests, test pass rate -while an indicator, and certainly a useful guide to keep an eye on trends, should not be used to say “we have no bugs since 100% of our tests pass”. While tests suddenly dropping to 70% pass rate should raise eyebrows and cause concern, 100% pass of tests that don’t test the critical stuff is again, not as useful as it could be.

Static Analysis Metrics

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
Your code has been rated at 10.00/10 (previous run: 10.00/10, +0.00)

Static analysis tools are wonderful. They find issues automatically, they can help with coding style conformity, and at a company where we methodically fixed 16,000 warnings in our C code base, we found a lot of critical bugs that were being missed.

However, that warm, happy feeling when pylint scores your code as 10/10? That’s nice, but again, correctly written, coding-standards adherent code doesn’t mean that code is doing the right thing. It’s another tool in our tool belt, but shouldn’t be used to convey ‘quality’ in terms of delivering the correct behaviour in the product.

Eric Reis posits that “the only metrics that entrepreneurs should invest energy in collecting are those that help them make decisions”. Vanity metrics look good, but do they help move your business forward? Look instead for actionable metrics —ones that indicate where you should focus your testing, where you can add value and quality to the product.

Examples of Actionable Metrics in Testing

Recently changed code — if all the changes in this release are in module x, there’s potentially little benefit to testing module y that hasn’t been touched since 2015 and has no inter-dependencies. Perhaps target your testing, or new quality initiatives — at module x.

Customer issues — if feature Y is causing the helpdesk 90% of the issues — focus on feature Y. Find out where the issues are, the commonalities, and attempt to break down the biggest problems and what may be causing them. Accelerating the shipping of quality product by reducing these issues will reduce the customer complaints, and increase customer happiness. Customer happiness IS quality!

Remember — if you can measure it, you can manage/improve it! Take those actionable metrics and start working on them now, and late at night after your next commit, when you see that perfect linting score, take a moment, smile, and realise while that it doesn’t matter, it’s still nice to see good news.

--

--

Quality Assurance Advocate, Christchurch, New Zealand. Also a mad keen traveller. What's your favourite town to visit?