A first Quality Gate at TheFork

How we have put the place the continuous Quality Gate

Kong To
TheFork Engineering Blog

--

Following “The need of a Quality Gate at TheFork”, we are going to walk through the way to setup the Quality Gate (QG) on the Continuous Integration (CI).

As we were already using Sonarcloud, we just needed to create a QG with specific conditions depending the nature of the project. As there is a disparity in terms of quality level among all projects in our inventory, we decided to rollout the QG progressively. Teams are then forced to pay more attention to the quality of what they are about to ship. Next, we will elaborate how we do the planning and execute it.

Related articles :

How to put in place the Quality Gate ?

Again, a QG’s purpose is to enforce the quality in the process. Once it is deployed, it will likely block Pull Requests (PRs) in the Continuous Integration (CI) pipeline, so teams won’t be able to merge their PRs. Hence, we need to pay attention to this point. We must anticipate the consequences at maximum to smoothen the activation of the quality gate and avoid frustration from teams.

A wall that blocks everything is the danger of driving changes.

If we want people to adopt any change, we don’t want to block them from continuing to deliver. Obviously the communication is super important. Teams must be aware of any upcoming change and be prepared to it. And a good communication comes with a clear message with well defined content. We must have a clear definition of the QG, as such, what is about to be set, the exit conditions, planning, impact and risks. In other words, we must define a clear vision of what we want to achieve. To have a more smooth adoption, we must slicing into multiple steps for progressive rollout with a precise planning with flexibility.

More concretely, in order to have the QG in place, we initiated a technical project and defined many steps.

1- Assessing our current state

As time goes, we came to feel some pain because of the poor quality on our digital products. But it started with only feelings, we needed to confirm it is true. We needed then to assess our current state by looking into production issues, bugs detected during testing phase, and dig into history of technical debt cumulated from past decisions. Once the assessment is done, we put all pieces together and got a good idea of actual technical debt. Among all issues related to quality, we won’t talk about all issues to be addressed. We will look at one axe of improvement to earlier prevent issues from occurring. That is translated into having a QG on the CI.

In addition to the assessment by looking into know technical issues (software design or architecture), history of decisions and tracked technical debt (decisions with compromises), we build an inventory by gathering all metrics of all projects from Github, Sonarcloud and our in-house quality scoring tool name Beacon. We will later use a data-driven approach to define our rollout strategy, and prioritise our actions.

The inventory of projects show four categories of projects :

  1. Projects that are having relatively good health
  2. Projects that are in the process of being decommissioned (we call them legacy)
  3. Projects that are quite active while having poor quality
  4. Projects that do not provide metrics

We will start paying more attention on the two last ones.

2- Identifying roles and responsibilities required to tackle the project: using the RAPID model

RAPID model

With a director who owns the global quality matter, we defined a RAPID model in order to clarify roles and responsibilities of each one who get involved. Then we can build a team.

RAPID stands for Recommend, Agree, Perform, Input, Decide

3- Building a quality team: who does what and when?

The core team is composed of a Software Quality Architect, a Principal Engineer, a Product Manager (PM). That team is leading the topic by defining goal and plan, and push further every single step of the project. When needed, we request inputs from delivery teams and/or validation from the higher-ups. I myself am acting as the Software Quality Architect to define an adapted strategy in detail, the PM is the one to coordinate our activities and the Principal Engineer is our most valuable advisor as he is in touch with developers on the field everyday.

4- Defining objectives and milestones

Rollout of first quality gate (dates are moving)

As we can see on the table above, we have split into 6 steps. At next section, we will roll out every step.

Please see this as an example. It’s not meant to be a guideline or playbook, so you may define your own if you have the same need.

5- Executing the plan

Let’s walk through the steps.

Step 0 : Minimal metrics

By looking at the inventory, we have noticed there are many projects without metrics. At that stage, we aim to have metrics so we need a way to force projects (thus teams) to provide metrics. How do we achieve that? I’ll give a bit of technical details, please feel free to skip that part.

As we use Github and Github Action, we first define several branch protections to make sure PRs go through a ‘SonarCloud Code Analysis’, which does the quality check. Then we create several sonar QGs to assign to projects. Finally we add coverage threshold conditions for new changes. The threshold is set to 30% (arbitrary), so PRs are required to have greater than that threshold. Later we will increase incrementally up to higher threshold.

The QG is now active.

Some projects did already have a quality gate associated, so we did not delete existing quality gates, instead we keep the them as they exercised more constraints than the minimal required.

Technically, if a PR provides the code coverage, that implies it provides all metrics that come along. Mission accomplished!

Step 1: Overall code coverage threshold 5%

This step is a complement to the previous one. Here we check also the overall code coverage of the project, in addition to coverage of new changes. 5% is the initial threshold for overall code coverage we have set, so that later we can increase progressively. At the end, we come to realise that we can’t yet tackle the overall code coverage. The technical debt is too much heavy to address.

Step 2: Increment of new code coverage

We did a workshop during development guilds’ meetings to request for inputs and we set voting session. As a result, the ultimate threshold 80% is voted. So we decide to increment by 7% every two weeks, only for the code coverage on new changes. After seven iterations we reach 80%.

Step 3: Reliability & security

We have also add 2 more conditions to the quality gate. At this stage, we first set the reliability and vulnerability condition to rating “C”. And 2 months, we will set them to “A”. This allows teams to have a bit more time to plan their work, and be aware of difficulties, if any, on their projects.

Long story short, I will stop the story at TheFork here, as at the time of writing we are still at steps 2 and 3.

The next steps would probably take more time to come to reality :

- step 4: Overall code coverage

- step 5: Maintainability

- step 6: Unification of rules and clean up.

- step 7: Testing practices

— — —

What more we can do ?

Wherever I have been in my previous experience, almost all projects left me a lot to be desired. TheFork, less than 20% of projects have poor code coverage, while the rest are having coverage higher than 70%. Shall we agree we can’t just look only at code coverage? Sure enough. As the article main concern is about code coverage, I won’t divert.

If we want to go further, we may start a few more initiatives, such as promoting TDD or BDD to pivot our way to write tests, while some are already on track, such as observability, security champions, design system and so on.

— — —

Take away

It’s been quite a lot of work and effort to get where we are today at TheFork, there are still a lot to come. Not everything is great, and not everything is bad. Continuous improvement is the way to go. There is always room to get better. Everyone should feel accountable for a “non-quality” matter, and should take ownership, because should be owned by anyone who works on it.

I hope you appreciate reading this article. At last below are some keynotes as take away.

  • Change and adoption (thus change management) is the hardest part
  • Adoption start with an agreement, contributions from people is synonym of agreement and commitment
  • Blocking the delivery is a risk for both business and adoption, else only frustration and complains we get
  • Communication is crucial, that help teams to adopt changes and anticipate things to come and to do
  • Progressive rollout and planning, being empathic toward teams : they have their own concerns everyday.

If you may want to read the previous article “The need of a Quality Gate at TheFork, To enforce the continuous quality, and we first tackle the new code coverage”.

Many thanks!

Reference

Glossary

  • CI : Continuous Integration
  • PR : Pull Request
  • PM : Product Manager
  • QG : Quality Gate

--

--

Kong To
TheFork Engineering Blog

Architect, code crafter. Code quality matters. Technical writer @TheFork, a Tripadvisor company