Easy measures to observe our software quality efficiency

Julien BARLET
Decathlon Digital
Published in
11 min readMar 6, 2023

--

Without measurement, there is no experimentation and consequently no hypothesis can be confirmed or invalidated. Measurement, when it is well chosen and correctly calculated, allows us to make choices and assume them because their impacts are known through the results. The improvement is acquired on an objective basis.

At Decathlon Technology, many organizational and technological initiatives are implemented in order to bring value more quickly with the best level of quality to our sports users and teammates. To assess good and bad choices, we have for example committed to measure our delivery performance through DORA metrics: Deployment frequency (DF), lead time for changes (LT), mean time to recovery (MTTR), and change failure rate (CFR)

The “accelerate” metrics MTTR and CFR allow to evaluate the stability of the product. The stability of a product or service is an essential criterion of quality, but it is not sufficient. A stable product can simply be disappointing for its users.

So what about the quality measures?

The quality level of a product or service corresponds to the average perception level of its users. If the experience is efficient and meets the users’ expectations, then the quality is acquired and judged as good. However, we still need to understand and identify user expectations, and we need to be able to capture what does and does not meet these expectations for the product.

If we consider that a team has identified all the important criteria defining the quality of the product it delivers, the simple monitoring of production feedback gives valuable insights into the level of product quality. Coupled with this, other metrics such as Net Promoter Score or app rating systems prove to be valuable.

Good, but to act on the quality level of the product, the team must build a powerful and efficient quality management system, which will both prevent defects and identify them before delivering the product to users. This quality management system is the sum of the methodological and technological means implemented throughout the development chain: The multidimensional approach to testing (Dynamic, static, functional and non-functional) combined with the methods used (3Amigos rituals, TDD, ATDD, BDD, code review, test review, pair programming, mob programming, exploratory tests etc…). And above all with a constraint by various “Quality gates” on the whole development chain.

So how to assess the overall performance of a quality management system ?

There are many measures to assess the performance of quality management system activities. Just on dynamic testing activities, one of the most well-known is the code coverage by unit tests. It consists of providing teams with the areas where unit tests are passed and not passed. We can add the metrics of mutation testing, in order to assess their efficiency. On a higher level, when we test REST APIs, we will try to measure the coverage of the different endpoints exposed by a service, and also on the different routes (GET/POST/PUT …). And for the functional requirements, the coverage indicator calculated through a traceability matrix allowing to associate the functional requirements and the tests covering them can be very useful, practical for high level tests such as end to end tests…

We can also measure vulnerabilities via SAST (Security Application Security Test) and DAST (Dynamic Application Security Test) tools, the maintainability index of the code via static analysis tools… In short, these examples are only a sample of what can be measured in terms of software quality.

All these examples of measurements will allow us to appreciate locally and act precisely on one of the elements that make up our quality management system. However, the global assessment remains the sum of these measures, and this can become tedious to build.

At Decathlon Technology, we decided to set up 2 macro measures, per product, which are easy to calculate to evaluate the performance of the Quality Management System (QMS).

  • One to measure the ability to identify defects : Defect Detection Efficiency
  • One to measure the ability of a team to handle defects : Defect Removal Efficiency

The “Defect Detection Efficiency” to assess the ability to identify defects

The DDE of a test phase, expressed as a percentage, is the ratio between the number of defects found in a test phase and the total number of defects.

DDE = (Defects found in this phase / Total number of defects) * 100

In this example, we can quickly conclude that 85% of the defects have been identified within our walls because this number represents the sum of the DDE of each internal test activity.

If in your QMS, you perform non-functional tests (Performance/Security), these test activities can also be integrated to your DDE.

What time period should you choose?

We would think naturally that the time period should be between 2 releases. But then, what should be the time period for products or services that are deployed continuously, for example 50 times a day ? DDE calculation could be hard …

Finally, the DDE calculation period can be completely arbitrary. A team can decide that the period is the duration of one sprint if it works in “Scrum”. It can also calculate its DDE every 15 days, every month, whatever, as long as on a longer time scale, we have comparable periods.

How did we implement the DDE at Decathlon Technology?

To implement the DDE perfectly, you must be able to

  • Track all defects identified by all test activities: Functional / Non-functional / Automated / Manual.
  • Track all production feedbacks qualified as product defects

Among these requirements to calculate the DDE, what did we have as easily recoverable data?

Defects identified by manual testing activities (e.g. exploratory tests, acceptance tests on our User Stories). This information is in our ALM and is easily accessible through APIs.

Defects reported from production. Here again, after qualification of these defects as “product defects”, they are injected into the ALM and become easily accessible.

Unfortunately, It was on automated testing that we had to make concessions. The majority of our product teams have automated tests on several levels (unit tests, integration tests, end-to-end tests), but our pipelines do not yet allow us to easily retrieve this precious data.

So in order to obtain this measure and avoid a tunnel effect, we chose to calculate the DDE with what we have at our disposal as data. We have, in the first instance, chosen to discard everything that is identified by the automated tests and also to simplify testing phases : one phase for defects identified before production and the other one for defects identified after production

Concerning the period, knowing that the release management of each product is different, we chose to calculate the DDE monthly.

Below are our first measurements

On the left, you have the list of all the products and services on the Ecommerce domain, and for each of them, you have the DDE per month (points on the purple curve).

Our DDE is calculated as follows:

DDE for a month ‘m’ = [Number of defects identified internally (excluding automated tests) during month ‘m’] / [Number of total defects identified (internal (excluding automated tests) + production) during month ‘m’] * 100

The green bars correspond to the number of defects identified outside production. The red bars correspond to the number of defects identified in production

The purple dotted reference line indicates the alert zone of the DDE. The threshold value has been defined arbitrary

Depending on the product, the DDE can be quite different and gives an indication of the overall performance of the quality management system.

The DDE should not be a measure to blame a team, it should encourage analysis, reflection and implementation of quality engineering practices to improve the performance of the QMS.

Analyze the DDE and take action

Each product has its own DDE measures that reflect their QMS. In addition to providing the health status of the QMS, the trends of the DDE curve and its variations are a call to serve as an opportunity for the team to analyze and improve their testing practices. The DDE eliminates subjectivity in evaluating the QMS and provides objective results.

Let’s take this real example below, and let’s make a first analysis

This example illustrates a performing QMS with a DDE around 90% from January to November. 1 bug in production per month reported for an average of 30 defects detected by the team’s QMS, excluding automated tests.

In November, there was a net decrease in the DDE, down to 66%. This decrease can be explained by a clear increase of defects in production, 9 in November.

What are the reasons for this net increase in defects in production?

Although we observe a number of defects detected internally in the low average, this month of November was special for the team, the Quality Engineer of the team took 1 month of leave.

During this absence, all tasks of the quality engineer were delegated to the team such as:

  • Designing the acceptance tests for the US as part of the DOR
  • Execution of acceptance tests for DOD
  • Exploratory tests

In November, the design of acceptance tests and their executions were omitted. Only exploratory test sessions were performed. These are therefore the reasons for this inflection of the DDE.

The actions that were put behind are essentially actions of:

  • Pair design of acceptance tests
  • Pair testing of these same tests

These tasks carried out in ‘peer’ will allow the development team to desacralize this type of task, to reinforce the quality culture and thus in the future to be able to move to a quality assistance organization model.

Below, you could see, thanks to these actions, you could observe the results on the DDE computed on December

The “Defect Removal Efficiency” to assess the ability of a team to handle identified defects

Effective defect removal is a crucial aspect of product quality. A good defect elimination process promotes the release of products with fewer latent defects, which generates high customer confidence. A good DRE keeps the defect blacklog low, can also boost the morale of the team.

Like the DDE, the DRE is calculated over a given period of time, which can also be uncorrelated with the development cycle. For example: 1 week, 15 days or 1 month. Over this period, we need the number of defects that have been identified (F) whatever the test activity and the number of defects handled (Fixed, closed without fix … etc) (R).

The DRE for month “m” = R(m) / F(m) * 100

How did we implement the DRE at Decathlon Technology?

The implementation of the DRE is possible if some basics in terms of application lifecycle management are respected, which are :

  • A defect must be an item in the ALM with a creation date and a resolution date
  • An identified and resolved defect must be identifiable with an dedicated status

Once again, as for the DDE, the level of rigor on the tracking of defects is essential. A perfect DRE assumes that for each test activity (e.g. unit tests, integration, end-to-end, UAT, etc.), the defects detected must be formalized in the ALM.

However, there are certain situations in which defects are not counted because they are not added to the ALM. For example, in a product team where there are several levels of tests triggered by the CI/CD following the push of a commit, if one of the regression test activities identifies a defect, the developer will correct it as part of his pull request by another commit push, the defect will not be materialized by a ‘Defect’ item in the ALM and therefore not counted for the DRE.

At Decathlon Technology, we decided to use the same period as for the DDE, i.e. one month.

The DRE calculation requirements are covered by our software life cycle management system, which is open and therefore searchable through its APIs.

Below are our first measurements

On the left, you have the list of all the products and services of the Ecommerce, and for each of them, you have the DRE per month (points on the purple curve).

Our DRE is calculated as follows:

DRE for a month ‘m’ = [Number of defects handled during month ‘m’] / [Number of defects created during month ‘m’] * 100

The green bars correspond to the number of handled defects and the red bars correspond to the number of defects created during each month

The purple dotted reference line indicates the alert zone of the DRE. The threshold value has been defined arbitrary

Also, depending on the products and the teams, the DRE quickly shows which teams are able to manage non-quality and which ones suffer from the weight of non-quality

Analyze the DRE and put it into action

In the example above, we observe an excellent tendency of the team to correct the identified defects. The team is not indebted from the point of view of the quality of its product while assuming the responsibility of the evolutions.

In September, we observed a dropout, indicating that the team was no longer able to correct all the identified defects. The explanation is quite simple and common in all IT projects: unforeseen events involving complex evolutions with tight deadlines.

Even if this is part of the life of any project, the DRE allowed the team to quickly become aware of the problem and to take action to reduce the debt…

Conclusion

The DDE and DRE are simple to understand macro measures for an IT team, regardless of the team’s profile (Product / Technical). When properly measured, they enable the team to make informed decisions and assess the impact..

As for all measurements, it is the analysis and animation that is key. The role of the QA/QE is crucial for it.

Thanks

  • My great colleagues and teammates for their advices and reviews : Vincent, Guillaume, Caroline, Benjamin …
  • QEUnit community for inspiration on Quality Engineering and their sharing

--

--

Julien BARLET
Decathlon Digital

Engineering Manager @Decathon. Passionate about quality engineering