RedLight & Testing

Our thought process to create Quality Assurance processes.

João Gonçalves
RedLight Blog

--

Hello everyone. Now that we’re all quarantined at home living the forced dream of remote work, I think it’s a good time to share with you some thoughts about our internal development process at RedLight with a special focus on Testing & QA.

TL;DR — I’ll be covering some of the key points around which we’ve built our own internal quality processes. Namely:

  1. Setting clear roles for everyone who takes part in a project;
  2. Keeping the process simple;
  3. Create a process prone to error mitigation (not just output for the sake of it);
  4. Ownership;
  5. Document & Prioritize;
  6. Project-wide transparency and accountability.

Along the years I’ve dealt closely with testing and development teams, I got the chance to understand why testing is, in general, a very understated and underrated part of software development. Many people will tell you that automated testing is enough and others will praise that testing in production is also fine. Personally, I think that none is enough nor fine and you should take Quality Assurance seriously if you want to create products that work seamlessly. Note that while bugs are generally accepted in your social network or video game, they aren’t as easily accepted when they happen while launching a satellite into space or while monitoring vital data (like some projects we’ve worked on) because if things go wrong in space or healthcare products, your bugs are likely to have real costly consequences. That said, let’s dive into software development and quality assurance.

The current state of software development for the web or mobile is like a jenga game.

Most modern systems depend on other people’s systems. Once you start developing an app, to ease and speed up the process, you eventually use somebody else’s library. That library you just imported to your project uses a few other libraries that were created by someone else, whose libraries depend on code from other developers. You see where I’m going right?

The system you’re creating is at the top of your personal architectural pyramid and most times you don’t really know what’s really beneath the first layers. This means that if you think your app might be working correctly today, due to lack of maintenance from any other third party pieces it may stop working at all and this can happen without a warning.

These pieces of software aren’t always properly maintained and many owners/contributors won’t even bother to let you know that they have stopped putting effort into it, resulting in bugs over time and sometimes severe security problems. So it’s fair to say that once you deploy something into production, the system you created has started its path to partial degradation and to be filled with errors in the future - getting technologically rotten. What is working today, might not be working tomorrow.

To mitigate all the errors you can inherit, you can either let them happen and solve them — which is not what your client would be happy with, or you can try to test and perform quality assurance on your system, testing soon and properly.

And yet, testing is a process that’s made to fail, in the sense that it’s always only going to cover your known unknowns — all of the things you know that might not be working and the things that you have the tools to test. Everything else is the unknown.

At RedLight, testing has many phases: Unit Testing (automated), Integration Testing (automated), Functional Testing, System Testing, Stress Testing (automated), Performance Testing (automated), Acceptance Testing, Regression Testing, Security Testing (automated) and Beta Testing.

The way we do our software development is to leave Unit, Integration, Stress, Performance and Security Testing to our development team. Functional Tests, Acceptance Tests (also performed by the client himself but that’s a whole new blogpost) and Regression Tests are responsibility for the QA team.

Of course, any team should create a Quality Assurance role that is appropriate for their process and the responsibilities will vary from company to company.

Independently of how many responsibilities you place on your QA team, it’s important to know that software engineering is a team activity and not only led by the dev team as many think. Projects are maintained by several people that contribute differently to the ultimate result.

That said, here are the big takeaways from what I’ve learned by being the lead QA on RedLight and how you can implement these in your company.

1. Set clear boundaries for each role.

Each company has their own process but it’s extremely important for everyone to understand where their responsibilities end and the next person’s start.

In our internal process, the big boundary is the code review: An engineer submits his changes for peer review and once all reviews are positive and/or acted upon, the changes are merged and deployed. And it’s only after this that the QA team comes in.

Even before the QA team gets the prime time to try to break the system, it’s still important to know if the engineer has cleared all his responsibilities like unit and integration testing. At RedLight we always make sure that for each merge request, we have automated tests. We do this to make sure that this new merge request does not break any previous tests and that the current merge request has high test coverage.

Despite the way your team specifically works, it’s always important to set clear expectations and responsibilities for each team member and phase of the process. Don’t let anything happen in a gray area.

2. Keep the process simple and easy to understand.

This is probably the broader advice I can give for a company’s process.

No one wants to have a 100 page process on how to do their job but it’s still important to be clear for everyone on the team, from project kick-off to delivery. And however you set your process, remember: faster isn’t always better.

For example, at RedLight, code reviews are performed by at least 2 engineers that have not been part of the development of the task related to the committed change. This methodology has impact on velocity but it’s much cheaper (and safe) to examine someone’s code before it gets merged. If the merge request shows something wrong like inconsistent implementation, mismatch of project requirements or buggy code, it gets sent back to the developer to rewrite it. If the merge request looks OK and has the two unanimous checks for approval, it moves on to be merged and tested. This method instills good practices of teamwork and is a virtual platform for knowledge sharing (both for the product you’re creating and engineering do’s and don’ts).

After that, the QA engineer comes into play and knows exactly what to test and what to do if there’s something wrong with the new changes. If the QA engineer finds bugs related to the task being tested, he sends the task back with a “QA failed” report for the engineer to act on. That task starts its process once again and it’ll need to go through the two unanimous successful code reviews before it gets merged to be tested again. If everything’s OK with the task, keep on moving.

3. When things can go wrong, they’ll eventually go wrong.

Many times during the development process, be it engineering or QA’ing, there will be pressure to do things fast. “Move fast and break things” is a motto that doesn’t apply to most companies but many still follow this approach.

Our experience tells us that if there’s something that suffers with the pressure to launch a newer version is the way things are tested. Usually, when there’s a lot of hurry to meet a deadline there’s no automated tests to be found in merge requests, QA only goes through the most basic cases and all the green lights light up for launch.

But when things are done this way, all the preconditions for a disaster are in motion.

So, one thing is the ever present tight deadlines but the other is not being careful with the tasks you’re dealing with. Cutting corners now will make you invest a lot more time and psychological effort into patching a solution directly to Production.

No one likes to be woken up at 4am because PagerDuty is notifying you that there is a critical bug in Production. So it’s better to invest some time on proper testing, code review and quality assurance to ensure a proper night’s sleep and potentially saving lots of money in business value and development effort.

The whole reason for pushing Testing + QA to your process is that it helps you or the client to keep the product stable. The less you do it, the more unstable a product is likely to be. It doesn’t help much if you let scope creep get the best of your process while competing with your timeline.

4. “It worked fine in my computer. 🤷‍♂️”

Everyone has heard this at least once in his lifetime. This is the most common case where accountability fails and someone in your team doesn’t want to be seen as the faulty member, placing the responsibilities on anything (or everything) else.

There’s nothing wrong with having some commits that introduce defects that get pushed through, but there’s a big difference between owning it or not.

Note that every time a developer is creating something, he’s doing that in a controlled environment where the outcomes are usually predictive or deterministic. Most of the problems that the development team doesn’t find will always appear in non-controlled environments with components that interact differently with the system.

Once a deploy is made, the QA engineer is not just testing the diff or the code change — he’s testing a system made up of users, infrastructure, third party libs — moving parts. Many of these pieces have unpredictable interactions that you wouldn’t think of while developing your task. And if your Staging environments may act oddly, imagine when your code meets Production where users act unexpectedly.

5. A Wild Bug Appears. What Now?

The QA team and all of the testing exists for the sole purpose of keeping a minimum amount of bugs reaching production.

But it’s not sane for any team member to think that you’ll have a bug-free Production. Yes, it’s very easy to think that you should’ve delayed the launch of a product to solve all errors or that the next version should be on hold until we solve all defects but that’s not how the world works. If there’s any universal truth about software is this: it has bugs.

So, at some point, instead of letting a feeling of collective failure sink in, you should stick to the process: document the defect, evaluate the criticality of it, prioritize and keep moving forward.

As someone who’s in the industry for a while, I’ve known people sharing disastrous step-by-step scenarios of project mismanagement and this — the amounting bugs — is something that has a much bigger impact than it should.

First, obsessing over specific errors is not healthy at all. You can see developers, project managers and clients falling for this, moving allocations, roadmaps and effort towards something that might not be on top the priority list.

Secondly, as a software development company, you need to know how to manage the perceived quality of a product. For example, we know that our clients have bigger tolerances to errors that they don’t see when they’re interacting with a product. On the other hand, clients have a much lower tolerance on errors that impact the products visually or user data. So you might want to educate your client on his Perception Of Quality so he can prioritize things correctly.

As an engineer, you’ll always feel the hidden hand that pressures for specific things to be solved ahead of schedule but in order to maintain sanity, it’s important for the client and the PM to be on the same page that errors and failures help us getting better at what we do and shouldn’t get in the way of following the major plan.

Wrapping up:

  1. You will never be 100% sure that you’ve sorted out all of the bugs. Yet, you can be 100% sure that somewhere in the software there’s a hidden bug waiting for the right time to show up.
  2. Once you have bugs documented, prioritize them and start working from there.
  3. Stay calm.

6. Transparency and accountability.

Software development is a team job. Making sure that you have project-wide visibility for all the stakeholders involved (teams, PM and clients) in getting something pushed to Production will eventually lead to better results and increased collaboration.

With proper collaboration and transparency (like the one we see on our merge requests), your company will be better prepared for when that wild bug appears, by allowing engineers to read each other’s code and to make improvements on the spot.

On a project level, the Project Manager is the face that responds for the product’s final quality. But in the middle of the process the ownership of the accountability might belong to a specific developer or QA engineer.

Just to give you a glimpse, at RedLight the responsibility of a properly implemented task moves from the developer, to the code reviewers, to the QA engineer and finally it rests with the Project Manager and the QA lead. At the end of the day, all of these members share some of the responsibilities and good results tend to be seen when proper collaboration is in sight and when team members feel empowered for the quality of a product.

Independently of your company’s methodology, the key to keeping things flowing regularly is to track accountability with project-wide transparency. It gives everyone a sense of importance and responsibility towards their colleagues and usually leads to better results with an engaged and happy team.

Conclusion

The whole world of testing and QA is about mitigating present problems and future risk that are currently unknown. Yet, we take risks all the time every time we push something to a new environment. And even if we are, as a QA team, in the business of lowering the risks, there’s not any guarantee that our company will launch a risk-free product. So we should not let fear stop us from placing another piece on top of our jenga game.

The best way to keep your product stable and low on your Sentry log is to make sure that every step of the process has clear ownership, transparency and accountability.

João Gonçalves is currently a Project Manager at RedLight, having worked as the head of the QA department for 4 years.

--

--

Everything I know, I owe it to Internet and books. Follow me at @jpmgoncalves.