Organisational-level test strategies and modern approaches – Some thoughts-in-progress

In a recent role I was given permission to share some of my test strategy work publicly. It was largely generic in nature, given its high-level focus, so I’m pleased to be able to share a bunch of notes that may help others orient to the rapidly-changing landscape. What’s not here, and which I *should* include, is my approach to people leadership. This will hopefully be a work in progress and I will update from time to time. If you have any thoughts or criticisms, I’d love to hear from you either directly or in the comments.

The focus here is to provide helpful strategies in a modern context. If I was running my own company, there would certainly be some differences in approach, but I think this a sensible way to view the world right now.

Test Strategy

Guiding principles

Quality is value to some person (Weinberg).

Thinking about Test Strategy – A mnemonic device

http://www.software-testing.com.au/blog/2016/06/03/tools-for-thinking-about-context-agile-sliders-reimagined/

Context-driven testing – https://context-driven-testing.com/

Tests support development and demonstrate value to stakeholders

Marick’s test quadrants set an expectation that through tests, developers create the ability to make safe change and continuous delivery of value.

http://www.exampler.com/old-blog/2003/08/21.1.html#agile-testing-project-1

Optimising for flow – Test automation is the primary gate on release

Testing will never be perfect and despite best efforts, there will always be bugs. As such, the ability to quickly gain confidence in a change and safely and repeatably take it to production requires appropriate test automation. Skilled, exploratory testing and independent testing provide review and validation of the automation suites.

Test pyramid

The test pyramid provides a strategy for optimising build time, and is incomplete as a discussion of coverage. See http://www.software-testing.com.au/blog/2018/05/10/on-ui-test-automation/ for considerations when optimising feedback and coverage.

Quality is not a slider

Attempting to use quality as a lever to speed delivery leads to unpredictability. We can break quality down into two categories – Internal and external. Internal quality is those aspects of quality that help the team deliver reliably and predictably. External quality is those aspects of quality valued by the purchasers or users of our software. While we can speed delivery by making tradeoffs in external quality as a result of it being primarily visible as scope, the same is not true of internal quality. Logic errors, poor design, poor test suites lead to slower delivery, late detection of errors and more bugs (https://arxiv.org/abs/2203.04374 ).

Traceability

Tracing to specifications is helpful, but not sufficient. Traditional traceability is an unsolved problem due to the many-many relationship between requirements and tests. We gather evidence to show requirements are met and define requirements as relationships and interactions between humans and the world. See https://nap.nationalacademies.org/catalog/11923/software-for-dependable-systems-sufficient-evidence

Testing confidence is achieved through notions of coverage. Coverage is understood with respect to models.

Excellent testing builds confidence through a process of bringing various models into alignment. Models are formal, informal, explicit and tacit. At a minimum, we expect the models to include –

Intent, communicated through clear code.
Corresponding tests that demonstrate compliance with specifications.
Corresponding tests and/or monitoring that demonstrate business capability and stakeholder value.
Architectural risks (performance, security, interfaces) with appropriate coverage.

A list of 101 different coverage models can be found at https://www.researchgate.net/publication/243782285_Software_Negligence_Testing_Coverage

‘Good enough’ quality

Perfect testing is impossible, so we must always apply fallible methods for deciding how much to test and when to stop. “Good enough” testing is the goal, which means that the test approach is defensible in the face of the known risks and other constraints such as time, budget, people and resources. We may certainly make the wrong call, but nobody should be punished for making reasonable decisions based on reasonable efforts to establish “good enough”.

Clear objectives
Quality measures
Risks
Constraints

Independence

Assuming all members of the team have the requisite base level of formal testing skill, independent testers provide value largely through their experience and role-based independent perspective. Independence comes at the expense of flow, and presents another fallible decision we may make when deciding priorities for delivery and quality. Having many different people review and test a change also maximises the probability of finding problems (see Weinberg – https://www.amazon.com/Quality-Software-Management-Systems-Thinking/dp/0932633722).

As a general guideline –

People performing a tester role on critical systems will maximise their independence in order to maximise the likelihood of finding problems. This may manifest as ‘hardening’ or exploratory test windows in integrated staging environments for new features or significant architectural changes. By applying risk-based approaches to testing and prioritising development appropriately, this testing should not need to significantly delay completion of features.
“Bug bashes” and mob testing including people outside the team present a low-cost opportunity for independent perspectives.
Pair/mob programming with regular rotations of pairs is also a strategy for bringing fresh eyes to code and testing, or at least minimising ‘priming’ effects.
Code review should be considered the bare(ly) minimum standard.

Consider tests that are data and environment independent

Tests that focus on behaviour minimise the amount of implementation detail in tests. Good test design also allows abstraction of test data through the use of environment-specific data providers and externalisation of environment specific configuration.

Shift-left

Front-load critical learning and the biggest risks –

Validate key product drivers/goals/features
Validate architecture
Prevent easily-preventable, expensive mistakes
- Don’t use this as a reason to front-load thinking and analysis.
- Get the big things right then optimise for frequent change.

While RUP as a product was anything but agile, there are a lot of powerful ideas in Kruchten’s book on RUP which call out some key points that most early agile work fails to make explicit (despite the obvious influence). See https://www.researchgate.net/publication/220018149_The_Rational_Unified_Process–An_Introduction for some of the key ideas, or the full treatment – https://www.amazon.com.au/Rational-Unified-Process-Philippe-Kruchten/dp/0321197704

The AUP is also an interesting read, with key ideas presumably making their way into Ambler’s DAD. https://web.archive.org/web/20120214042439/http://www.ambysoft.com/unifiedprocess/agileUP.html

Shift-right

Identify opportunities to safely test in production
- Parallel/Shadow testing
- Synthetic transactions
- Staged rollouts
- Stage gates
Identify opportunities to use monitoring as a key part of the test approach
- Flow-based/end-to-end monitoring
- Run-time asserts to ensure system is behaving as expected, including heuristic tests (see https://rbcs-us.com/documents/Why-Most-Unit-Testing-is-Waste.pdf)

Test suites for third party services

Consider contract-style tests to ensure behaviour we depend on is unchanged, aligned to hexagonal architecture – https://en.wikipedia.org/wiki/Hexagonal_architecture_(software). These tests ideally identify the core capabilities of the organisation and/or value realised through the software system.

Ken Scambler’s presentation may be useful for developers in this context – https://www.youtube.com/watch?v=EaxDl5NPuCA

Consider formal models for stateful testing

Model Based Testing (MBT) – See https://graphwalker.github.io/

BDD as an approach is enhanced by considering MBT principles, providing a way to think about state coverage –

Given (initial state)
When (trigger)
Then (end state)

Outside of ‘test quadrant’ tests, consider that test frameworks can deliver value incrementally and iteratively if considered as a set of chainable functions or composable pieces-

Test design (MBT, combinatorial generation, all-pairs)
Manipulation of system state (eg. Snapshot/restore tooling)
Generation of system activity (ie. Drive the system via UI/APIs)
Provide/identify test data
Inspect state
Validate state
Reporting

Models for quality

Quality is driven from/by

Implementation
- Code and Infra
Product
- Are we building a good/viable product?
Platform
- Production state
- Deployment
- Infra
- Security
- Performace
Delivery
- Teams need ‘space’ to do things right. Is delivery pushing too hard? Is it prioritising appropriately for sustainable pace?
- Visibility of quality required as an input to the above questions.

Time

Identify ways to make time-based testing simple.

Infra

Consider the need to test infra code/config independently of the application itself (Pulumi, Terraform tests – https://medium.com/contino-engineering/terraform-infrastructure-as-code-testing-best-practice-unit-tests-bdd-end-to-end-scenario-c30d5a6921d)

Security

Automated scans for source and bytecode
PEN tests periodically or when significant new architectural patterns are introduced.

Accessibility

Automated checks as part of the build – https://sparkbox.com/foundry/series/automated_accessibility_tool_reviews , https://github.com/dequelabs/axe-core
What not to do (with a nice list of checks humans should do) – http://matuzo.at/blog/building-the-most-inaccessible-site-possible-with-a-perfect-lighthouse-score/

Governance/Regulatory testing

Path to prod should include appropriate governance for regulated parts of the business.
DR testing to be considered.
Evidence of successful Backup/Restore capability is a frequent requirement.

Food for thought – https://www.youtube.com/watch?v=9Q5MhKUVLkc

https://up.com.au/blog/continuous-delivery-at-up/

https://continuousdelivery.com/2012/07/pci-dss-and-continuous-deployment-at-etsy/

Performance/Stress

Component test performance ideally addressed by builds. That is, perform heuristic checks that can detect significant changes to performance (faster or slower could both be problems)
Concurrency tests are ideally performed as part of the build. These are a frequent blind spot for unit and acceptance tests.

In practice

Focus on testing outputs as a part of the release/change governance

Make clear the intent of the change (good behavioural tests will assist), and provide evidence that –
- It solves the problem
- Does not introduce new problems (eg. Regression tests)
- Meets appropriate internal/external compliance requirements (security, accessibility, privacy, performance, supportability, access control, technology standards, observability/metrics)
- Evidence of appropriate stakeholder engagement
Provide evidence that the tested code is what is being released to production (ie. Commit id/Git SHA)
- Testing evidence can be linked to a commit
- Change to clearly include tested code
- Identify differences/limitations/risks in the testing (eg. Differences in test data and environments compared to production)

Build speed optimisation

Parallelisation for slower/longer running tests (Playwright seems the current winner). This requires an appropriate test data strategy.
Continued human attention to the value provided by different tests.
Avoid writing lots of tests where the value of a feature is not yet proven. No test is free.

Techniques

High-volume randomised tests are a valuable but rarely used approach.

Everyone testing should understand combinatorics, boolean logic and how to traverse a graph. This allows us to review the foundational model of tests and have confidence in the rote parts of the test process.

Checking your log files can be extremely valuable, ideally in an automated way. Check for unexpected errors and exceptions. If a certain amount must be tolerated, monitor trends/error counts as part of running regular test suites to check nothing new is appearing.

Coverage

Code coverage – %, branch, techniques. Trends are frequently more important than outright numbers.
BDD reporting – Transparency regarding product goals and their relationships to stakeholders are critical. Acceptance criteria for stories drive business-facing functional coverage and function as documentation.
Monitoring and Analytics
- Business flows, open telemetry

Data management

Ideally, consider carefully if production data is to be used in non-production environments.

PII approaches such as pseudonymisation/tokenisation ensures that data copied from production to non-production environments is not usable, but can complicate support and bug fixing. You may therefore need tools for desensitising customer data for test/support.

Consider heuristic checks for PII/password data in logs.

See https://piiano.com/blog/practical-pseudonymization-by-tokenization/

Tools

Open Telemetry/Jaeger/Grafana still seem a work in progress, but ideally there will continue to be opportunities to shift chunks of testing to monitoring, enabling efficiency (test/checks are built once but can run anywhere – test or prod).

Security scans should be considered mandatory these days – For code, and the running/deployed system.

Performance – K6 looks great as an opportunity to share artefacts with other parts of the development and testing process. Makes accessible to the modern development the ideas started in this book https://www.amazon.com.au/Testing-Design-Automation-Frank-Cohen/dp/0131421891

Contract testing – There’s PACT/PACTflow, and schema level checks in cloud providers. But care is required, and maintenance isn’t ideal.

Metrics

A few mandatory ones and some not commonly applied metrics

Code coverage
UI coverage (headlamp?)
Find/Fix rates for bugs
Incident rates/severities
Cycle time – At least measure the Incident → Problem → Change → Release time.
Have Product metrics and share them.

Governance/Audit

Logs to be designed to be used as sufficient evidence for change success
Per change, ensure audit trail on evidence for new features
Logging should be a part of automated test suites, seen as a first-order component
Immutable logs are likely required in regulated environments (Including regular audit/testing to make sure they continue to be immutable)

CMDB

CMDBs seem like one of those things that should help, but I’ve rarely seen them work. Make sure there are stated CMDB requirements/objectives if you plan to have one.

Capabilities

Content to come. This is about how you manage models of your products and align delivery, change and testing to those. The space is immature still, but the approach used by SerenityBDD should be taken more often. I just wish Gherkin would die.

Risk

Risk can be viewed as negative value
Think about preventable risks
- Rules based controls where possible, elimination strategies are best.
Manage impacts (knowable) over probability (usually unknowable)
- See The Black Swan/Fooled by Randomness
External risk tools
- Scenario planning – “What must hold true in order for our assumptions to be valid?” – See “Profiting from uncertainty”

https://hbr.org/2012/06/managing-risks-a-new-framework