The need of a Quality Gate at TheFork

Sorting out the technical debt and enforce the quality by first tackling the new code coverage

Kong To
TheFork Engineering Blog

--

The Technical Debt

I joined the TheFork about 9 months ago. Yet I’ve learned many things. I’m more than happy to share my experience throughout this article. We will elaborate about the first level of quality gate, more precisely about its purpose and how we have defined it, and more importantly what benefits we get from it.

At TheFork, the engineering teams have built great things. But as a matter of fact, not every project is as lean and clean as we would wish. There are things that requires attentions to align with the product vision and strategy, which is, and I cite below :

“We believe the best things happen around the table” and we build “a trusted ally helping me (restaurant owner) reach my success through business guidance and operational enablers”.

TheFork’s purpose is to provide the right dining experience for the right occasion, around the table, making diners happy and pulling restaurants toward success. However, the way to get things achieved matters — the Product does matter. For instance, the process to design and deliver values through our digital product must have a higher built-in quality. That’s why we have decided to set up a first quality gate, to smoothen our delivery pipeline, by remove some of our pain points.

Related articles :

— — —

Why a Quality Gate ?

A Quality Gate (QG) is an important piece in the Software Development Life Cycle (SDLC). It can be of use to get early feedbacks to reduce cost of defects resolution. It’s also a great tool to help improve the code quality, and simplify the component design and software architecture.

The code analysis generates metrics related to code, then we can if the codebase complies with our standards, and the standard must be defined. In an industrialised way would apply the same standard on all projects within the whole organisation. Then we can talk about quality gate.

The goal of having a quality gate is to first monitor then control the quality aspect on the piece of software we are about to ship to production. Then monitoring requires tooling and metrics. The control necessitates a process and also tooling. If we want to ensure a good quality (or enforce the quality of deliverables), we must monitor the technical metrics, and have a good idea of technical debt we have. And yes! Most projects (close to 99.99% of them) have technical debt.

Technical Debt

When we are in debt, we are in trouble and we must pay it sooner or later, or we go bankrupt. The same goes for software, and for the technical part, we call it technical debt. That is everything related to technical aspect in the design, build and run phases. The technical debt describes the results of actions (or consequences of decisions) to ship a functionality or a piece of software as part of a project which later needs to be refactored or revamped.

When we design a system (sub-system, components, module or block of code), sometime we are confronted with multiple options. So then we need to choose. And almost every decision comes with benefits and drawbacks. In the perfect world (with ideal options only), we wouldn’t have any drawback, so we won’t have to talk about technical debt. Right? For example, do we build an in-house solution or is it better buy a software to serve the purpose? A home made solution requires times to build and effort to maintain, while buying an existing tool requires to deal with a separate line of budget and put efforts to integrate the tool. Another case is when we have a deadline (I don’t like a deadline, but it’s there), so we would take a shortcut either in the design or build phase, such as quick and dirty code, omit writing tests and/or documentation. And a shortcut comes with a price that later we need to pay. That’s the drawback. That’s the technical debt.

Technical debt is not only about code, but also about product design, UI/UX, infrastructure, CI/CD toolchain, system architecture, software architecture, testing automation, manual testing… you got it right?

We won’t discuss about all aspects of technical debt, but rather focus on the code level problematic. In other words, we want to enforce the quality in our build phase, thus enforcing our development practices. That’s precisely where we at TheFork are aiming to address.

What are the impacts and risks ?

Technical debt is translated into more complexity, harder to read or understand, harder to add new feature, harder to maintain. As a result, implementing a new feature requires just more time, because there could so many obscure frictions which we hardly grasp.

As developers, often said “we don’t have time to refactor”, because we have business pressure with deadlines. At the end, the result is adding more debt. It’s not for the best interests of the company to be forced to introduce new technical debt. The more we need to work on it, the harder it is. So the coding time and time-to-market will be longer, and paying more developers most of the time won’t pay off.

Also, by having heavy technical, we are more prone to regression, bugs and production issues. That would result in flaky user experience, exposing to more risk of loosing users and trust, creating bad reputation and delays of deliveries.

Let’s see some facts

Let’s see some measures based on DORA model. DORA suggests using four key metrics to indicate the performance of software development teams. At TheFork, we need further details, by splitting cycle time into coding, pickup, review and deploy time.

cycle time vs coding time

By gathering data from Github, we captured the following useful DevOps Research and Assessment (DORA) metrics on one of the projects, which represents more or less the average :

  • Cycle Time: Fair -> 140 hours
  • Coding Time: Needs Focus -> 135 hours
  • Pickup Time: Needs Focus -> 20 hours
  • Review time: Needs Focus -> 45 hours
  • Deploy Time: Strong -> 36
  • Deploy Frequency: Elite -> Daily +

We can surely improve our productivity and reduce the time to market, if we improve our coding, pickup and deploy time.

That being said, the quality of the code matters! So we need to care about it.

Increase of bugs

Production Issues 2022

Last year, we have noticed much of production issues created and the number of fixes did not catch up with the number of reported bugs. That triggers some attention from the direction. We were then doing technical assessments to understand the context. The goal was about to identify high level problematics and pain points, in order to estimate the risks & costs, then define an action plan to be taken into account on the roadmap 2023.

The key points of the assessment cover :

  • The architectural perspective : from a high level point of view, what the known problems, patterns compliance and
  • The code level technical debt, based on complexity, readability and testing per component
  • The business impacts & product risks, of not doing anything
  • Recommendations with actions and prioritisation
Accumulation of bugs

By having a better knowledge on the current technical debt, we noticed that most of the problem’s root causes are related to :

  • Legacy components
  • Technical decision
  • Lack of testing, manual and automated
  • Code quality

So the Quality Gate is one way to tackle the code quality concern.

Defining the Quality Gate

1st quality gate with several checks

The schema above shows one part of the CI pipeline. At first sight, we see there are severals tests (unit, component, components integration), followed by a code analysis (linting and static code analysis), and at last the quality check to verify all exit conditions. We will focus only on the Quality Check block as that’s the quality pass.

In terms of checks, we ensure that :

  1. Code is properly linted (well formatted and compliant to coding standards)
  2. All automated tests (unit, components and components integration) are passed.
  3. Code coverage is above the threshold (usually 80%)
  4. Code Smell is low (close to zero)
  5. Reliability & Vulnerability (usually rating A, or zero issues)

If everything is good, we can promote to next stage, that means the PR is potentially mergeable.

Depending on the organisation workflow, we may want to do extra checks before merge, such as code review, regression testing, user acceptance testing, performance testing or AB Testing. At TheFork, we do pretty much all of them, but not as much widely as we should.

— — —

Just to summarise in a few words, a quality gate is one way to enforce the quality in the process of building of software. Now we have defined the Quality Gate with a focus on the code quality. The next step would be to roll it out. It won’t be as simple as we would have thought. We need to have a clear planning and milestones to ensure a smooth adoption.

More articles :

Many thanks!

Reference

Glossary

  • CI : Continuous Integration
  • CD : Continuous Delivery
  • SDLC : Software Development Life Cycle
  • DORA : DevOps Research and Assessment (DORA)
  • PR : Pull Request
  • QG : Quality Gate
  • UI/UX : User Interface / User Experience

--

--

Kong To
TheFork Engineering Blog

Architect, code crafter. Code quality matters. Technical writer @TheFork, a Tripadvisor company