Testing and Quality: Correlation does not equal Causation

9 min readFeb 9, 2020

How often have you heard:

“Our quality is poor, we need more testing (or testers).”

When you hear this, know that this indicates not only a spurious correlation but also a fundamental misunderstanding of what testing is or does.

Come on this journey with me where I explore the above claim, walking you through 1. Testing, 2. Quality, and 3. Building in Quality.

1. Testing

Testing is feedback

Testing does not ASSURE or bring about quality, it only reflects quality. It shines a light on the current state of quality.

Inspection of current state of quality (image source)

Software quality is often erroneously causatively equated with software testing. “The more we test, the better our quality will be”. This is an unfair and unrealistic expectation.

Testing merely applies a lens on your software quality, at a particular point in time, in the current environment.

Similarly, calling Software QA “Quality Assurance”, is a misnomer and a false promise. How do you suppose anyone can assure software quality? Truth is, nothing and nobody can assure software quality. We can only read and understand, and then consequently improve quality.

My title has been some form of “QA” (first acquired at an American employer, where the term is popular), since 2009. But in all these years, it has always been called “Quality Analyst”. We are analysts of quality, and in many cases we use this analysis and feedback to help the team improve quality, but we do not rubber stamp.

Testing should be continuous

Just like other feedback, testing should be continuous. Depending on what you’re writing, and where you’re at in your lifecycle, the testing types will be different. But the feedback should be continuous to provide continuous feedback.

For what that might look like in testing, please have a read of this excellent article by Dan Ashby:

https://danashby.co.uk/2016/10/19/continuous-testing-in-devops/

Let the test pyramid guide your automation tests/checks

I am a fan of the test pyramid created by Mike Cohn, meant originally for test automation, a.k.a. automation checking (some call it “checking”, because a binary outcome is not really a test, is it? It’s only a check of “does it work, Y/N?”).

Many people have written really well about it, and I will not attempt to paraphrase them but would rather point you to the source. Here’s one excellent example:

The Practical Test Pyramid

Ham is a software developer and consultant at ThoughtWorks in Germany. Being tired of deploying software manually at 3…

martinfowler.com

Your automated checks/tests should run continuously, with each small batch of work you undertake. With your feedback loops being small and fast, it means you can respond quickly.

Minimise your manual testing but make it really count

Because some equate testing with binary outcomes, unfortunately human testing (sometimes called “manual testing”) can be reduced to writing repeatable scripts that one then expects a human to execute & verify. These binary (yes/no, pass/fail) outcomes, should be determined by machines.

As I alluded to above, it is argued that there is no such thing as “manual testing”, only real (human) testing vs automated checks. The reason being, that testing is cognitive and requires exploration and intelligence, and of course checking doesn’t.

Therefore, please try not to make your humans do a machine’s work. Please make your machines work hard to remove mundane, repetitive tasks from your intelligent and creative team members. Let your humans — in whatever roles you appoint them, testing or not— add value by questioning, creating, interrogating, and thinking.

Also — please do not palm off activities that the developers prefer not to be doing, for example cross-browser testing, to your testing personnel. Guess what, your testing personnel dislike any mundane activities too. If you cannot automate this check (e.g. through visual validation), share any mundane manual tasks as a team: work out how to make it fun, and keep it as unscripted and unpredictable as you can, perhaps even through a team Bug Bash.

Beware inattentional blindness & testing fatigue

Scientifically, it is important to note that expecting testers to check multiple browsers, would be a prime candidate of where testing can suffer from Inattentional Blindness.

Who remembers that fantastic “invisible gorilla” video where the viewers are so focused on watching the players, they miss the gorilla completely?

(I’ve linked to the original white paper on Inattentional Blindness, below the image.)

This “missing the obvious” happens in testing too, especially where you ask a human to repeatedly check anything, including visual regression in UIs. I call it too, “testing fatigue”. Maybe the first one or two times you catch things, but the more you view the same expected outcome, the more fatigued your critical lens, and the harder it becomes to spot the obvious.

Appropriately involving testing personnel

Lucky you if you have testing personnel, because in addition to having a real knack for asking the right questions of systems under test, they can also share their critical thinking skills with the rest of the team and uplift their ability to ask the right questions in interrogating systems. Testing is ideally a shared activity between all team members, as software quality is in fact owned by the team. These days, testing personnel are also Quality Coaches, in addition to being specialists in testing.

If you have testing personnel, involve them throughout the lifecycle of your system, from ideation, to scaffolding, to coding, to integrating, to deploying, to production, including monitoring and alerting.

If you do not have testing personnel, make it a priority for your team to learn critical thinking and systems thinking. And if you can, hire a Quality Coach as a consultant or even permanently, to help close this skills gap.

Testability

With or without testing personnel, everyone involved in software and systems should consider testability.

Features/Epics. Can we test this feature (or the whole epic) once it’s developed? What is the definition of done?

Architecture. Not only can individual features be testable or not, but so too can architecture. This is achieved through for instance:

loose coupling & tight cohesion where components in a system are all tightly cohesed through belonging logically together, but are not tightly coupled where they are too tied to and not independent of each other;
decoupled classes with public interfaces, allowing for component testing;
being able to run headless or UI-less tests;
clearly identified, well-named elements in a DOM;
having the option of stubbing out the DB and not relying on real data;
to name a few.

2. Quality

Definition

“Quality is value to some person” — Jerry Weinberg

Measuring quality: continuous and early feedback

Some examples:

testing (as discussed above)
static analysis of code or models
e.g. includes linting (via IDE, build process or pre-commit hook)
testability analysis of components / features to be built
BDD and/or TDD
continuous integration
monitoring and alerting
build in monitoring, performance and other “non-functional” (a.k.a. cross-functional) requirements, from the outset — try not to retrofit them

Continuous Improvement: reacting to feedback

Now that you have continuous feedback in the right forms, quantity and balance, you can use this as a basis for continuously improving. Use the data and information from your testing in regular reviews, proactively or through retrospectives, post incident reviews (or postmortems) and any discussions where you are looking at doing things better.

Please remember to take a good look at the works of John Allspaw and Dr Sidney Dekker, on a Just Culture and on conducting Blameless PostMortems, to guide your thinking and approaches to addressing issues.

To quote the SRE Handbook:

“You can’t ‘fix’ people, but you can fix systems and processes to better support people making the right choices when designing and maintaining complex systems.”

Improve your systems and processes to better support people. Your systems should support people in making better choices and finding the answers when they do not have them.

Improving practices builds in quality — both in culture, behaviours and technology.

3. Building in quality — a tale of practices

By no means a full list, just something to whet your appetite:

Work slicing

Aim for thin, vertical slices.

This is not only testable on a full stack, providing immediate feedback on integration points, but potentially also deployable immediately and therefore provides early value.

Reducing waste

Bugs are waste. Eliminating bugs early on, means their potential impact is reduced if they were to be discovered later.

As mentioned earlier, frequent integration will provide you with helpful and early feedback to eliminate bugs that could later on, really trip you up.

How you respond to bugs now, helps your future quality.

Flow

Following on from waste reduction, reduce also lead time and response time. The longer it takes to get started on building an item, the more time intervenes and clouds understanding of original expectations, thereby increasing likelihood of deviating from expectations and introducing bugs.

For a more thorough read, please check out the DORA metrics which describe the 4 metrics that define high performance teams. These touch on:

flow through lead time & frequency of deployment,
on bugs through change fail percentage (what percentage of changes break builds, i.e. not caught earlier),
and on one’s ability to respond to failures through time to restore.

Just Culture

As mentioned above, a Just Culture — a culture of trust, learning and accountability — is essential in responding to any incidents, bugs or failures. Building in this culture of maturity and EQ not only makes it safe to reveal when something is going wrong, which means defects are discovered as they occur (not hidden for later) but also builds a culture of learning and improvement where corrective action (often the improvement of a system) is taken without harm or retribution.

Collaboration & visibility

(determine which could be helpful, based on your context)

Pair programming. Constant feedback and knowledge transfer as you work improves code quality even before later forms of feedback like CI.

Pull requests. Provide feedback through peer review on (hopefully small, frequent) chunks of work committed back to hopefully, trunk/master. This reduces later conflicts in merges. Again, this early feedback and accountability, builds quality in by preventing bugs or catching them early.

*-driven-development. I mentioned TDD and BDD earlier. If you use either, this will guide software development by writing tests, which assists in reducing unnecessary code (waste) and if you determine it relevant, keeping some of these for your automated tests (checks) suites later on. As any practice including TDD depends on context, here’s an excellent article on the industry thoughts: https://martinfowler.com/articles/is-tdd-dead/

Test Coverage. I’m careful to recommend test coverage as some can pendulum swing this to mean “100% or nothing”. But what test coverage does do, is it alerts you to untested code, and therefore missing feedback. It’s indirectly related to quality because it doesn’t indicate quality of testing, and it can be faked and cheated, but it can improve confidence and provide more transparency.

Kickoffs and (story, feature) lifecycle rituals. Leaning towards early feedback and shift-left, I very much recommend kickoffs of both projects, epics, and stories/features, quite in addition to other activities such as desk checks or shoulder checks. Kickoffs help clarify uncertainty, confirm testability and “definition of done”, determine the tests that will challenge this, and consider edge cases and negative scenarios. This helps build in quality by defining what quality is and how to measure it.

Systems thinking

See the system, not just the component. When you apply systems thinking, you are considering the whole system and not just the component at hand. One considers how components interact with each other and how a change in one, may affect another. Consider the bigger picture of interrelations and consequences, when one is considering making a change in one unit. This thinking may well prevent integration or interaction bugs.

A system and its components. Every component in this picture is both a component of a system, *and* a system itself. Image credit Milly Rowett.

Closing

Quality has both a reactive and proactive approach. Reactively, one can test & monitor and respond to feedback. Proactively, one builds in quality thereby avoiding stumbling over bugs later, when their impact and cost may be far greater.

Whenever your quality is suffering, pause before you call for more testing. Please rather take a look at your practices, feedback loops and responses to feedback.

I’ll leave you with a quote by one of the fathers of the Lean Manufacturing System, W. Edwards Deming:

“Inspection is too late. The quality, good or bad, is already in the product.”