Abstract Blues and Greys from Daniele Levis Pelusi https://unsplash.com/@yogidan2012

Quality Engineering: Adapt and Apply.

Dan Snell
Slalom Build
Published in
7 min readFeb 9, 2021

--

Slalom Build recently partnered with a client to build a next-generation platform on AWS, and the complexity of the client’s domain and technology stack required a flexible and adaptive approach to Quality Engineering. We understood from the start that there would be much to learn and that our quality practices and test automation strategy would have to adapt. And fast. We’d have to address gaps that developed as the project progressed, revisit previous decisions, and continuously assess the value we were bringing to the overall project. Here is our journey from initial approach through change, evolution, and adaptation.

Defining an Approach

At Slalom Build we believe in whole team ownership of quality, so our QE members are fully integrated into the delivery teams from the start. Kicking off the project, the team agrees upon a “definition of done” for all steps and components of the build. (“stories” to all you Agile experts) And that, by necessity, includes all testing and automation — which makes the quality engineer’s role during story refinement especially important. While software engineers tend to focus on how they will implement a story, the QE is often asking the question “Soooooo, what happens if…?” Those three dots are absolutely key, so the QE typically has to develop a broader view of how various changes can impact the overall system. For this specific engagement we knew that test automation was going to be critical for the overall success of the project, which led us to the difficult decision of foregoing a traditional test case management tool. The level of automation we anticipated would negatively impact our ability to keep automation in sync with written test cases. Of course, that didn’t mean quality engineers weren’t thinking through the needed validations for each story. On the contrary, this was a key activity in collaboration with the software engineers.

Additionally, we designed an initial automation approach that included unit tests, functional automation at the microservice level, and an integration test framework that focused on core transaction flows through the system. We recognized that the nature of the system we were building would require a high level of test automation. We also recognized that the nature of what we were learning about requirements and the domain space on the fly would cause the platform to evolve significantly through the project. Close collaboration with the software engineers would be key in achieving the level of flexibility and adaptability required. As a bonus, the output of these efforts often resulted in automation or scripts that could be executed in Postman.

Getting Started

And so, on to Sprint 0: the perfect time to work through setting up tooling and making sure that some of the plumbing was assembled. We used an internal starter kit that quickly let us stand up services and tests. In addition to a skeleton service and tests, a key piece of plumbing was the CI/CD pipeline with the goal of exiting Sprint 0 with the ability to build code, deploy services, and run tests. Automation from the start was used as a gating function and run as part of the pipeline, which helped keep it in sync with the code that it was testing — and reinforced the discipline of keeping testing relevant and healthy. These initial sets of services built the testing muscle memory that are critical to the delivery of each story. With these elements in place, the team moved into the initial delivery phases.

Evolution

During the first several sprints the teams were zooming along building the core microservices. The initial functionality was specific to the service itself and didn’t have any interactions beyond the boundaries of the service, and the corresponding functional automation was also fairly straightforward.

It all began to get more complicated around Sprint 4 when two key things happened. First was the realization that we needed to start getting our integration test framework off the ground. Secondly, we had our first interaction between microservices. And that meant determining how to account for these interactions in functional automation tests that were designed to run pre-merge and scoped to a single microservice. Now we had to deal with a service making a call out to another microservice, which raised the question “Do we want to try to spin up multiple services to support the testing of another service? Really?”

At this point, it was time to take that all-important step back to take a peek at the bigger picture. Once you’re a few sprints into a project and velocity looks good, it can be easy to get lost in the details of specific stories, sticking with a strategy already developed, even when it becomes apparent there are issues. Not this time. We realized that we had to re-think our approach, and make some significant changes.

All About the Journey

To deal with interactions across microservices, we developed some test scaffolding tools that would allow us to isolate each microservice pre-merge. This enabled us to focus on developing functional automation specific to the microservice, while building out a robust automation set. Running these tests in the branch gave immediate feedback to the engineers regarding the merge-readiness of that branch.

With robust coverage for each microservice, we still needed to validate that the various services were playing nicely when integrated. Since we knew these tests would grow progressively slower to execute as more components were completed, we were laser-focused on limiting the number of tests running in the framework to one per transaction. And to express this commitment, we called these types of tests journey tests instead of integration tests. What’s more, we approached building them as one would a bridge: by extending them bit by bit, creating a stable and robust superstructure. As happens on projects, we found there were some additional unaccounted areas to build tests for, and our test suites were extended even further.

The scaffolding, together with the journey tests, gave us a robust baseline of test automation. As the platform continued to evolve, we found it necessary to revisit earlier decisions. In effect, just as we had implemented journey tests, we ourselves were on a journey, following an adaptive, evolving approach.

The Road Continues…

As the project continued, we uncovered two areas that required immediate attention. First, we determined there was a need for a simple post deploy check before running any other tests — a smoke test to run immediately after the deployment to ensure it was available. Because these tests were non-destructive, we were able to run them in all environments, including production, to confirm a success.

Next, contract tests had initially been taken out of the project scope when we decided we didn’t want to take on the extra overhead of specialized contract testing tools like Pact.io. It was a decision that needed revisiting based on the project state. We created a subset of our functional automation that focused on the system contracts that the team could use as an alert mechanism. Now an engineer could easily see if there was a broken test and begin a conversation with the impacted service owners. We were also able to see if tests were changed during code reviews and ensure that the right conversations had taken place. Another interesting part of this approach was developing the expectation that if a team began to consume an interface from another service, that team would add their contract structure into the provider’s repo. This reduced the risk that changes would encroach on other teams’ progress and gave us a handy way of documenting which services were consumers of others.

On an Agile project we continuously assess scope. Features come in and features go out. Typically we lean towards narrowing and focusing as much as possible, but in this case we weighed the cost of not having the contract tests against the benefit and effort of adding them. In the end, increased quality and improved communication won out.

Asking Questions and Challenging Assumptions

Throughout the process, it’s important to continually review previously completed work and the decisions that drove it. This was especially valuable for some of the initial automation done at the microservice level. Many of the early tests had not been particularly robust, and had either been superseded by other tests or were not fully relevant to the complete functionality of the services. As we replaced tests or outright deleted any that no longer provided value, the efficacy of our tests increased. We also continued to look for ways to improve our understanding of the robustness and stability of tests, building out reports that aggregated results of our journey test across services and over time. This helped us identify trends and find intermittent errors.

Every project provides plenty of learning opportunities, and in this case, the most important lesson was to continue challenging ourselves on the correctness of the approach. Even with urgency from the client to deliver, we worked to avoid becoming complacent and resistant to change from our initial approaches and decisions. This ability to question and adapt allowed us to improve the depth of our testing, support multiple high-velocity development teams, and increase the quality of the platform we were building for our client. Testing truly is a journey with different stops and sights on the way, made more interesting and effective if we stop to ask questions and challenge assumptions.

--

--

Dan Snell
Slalom Build

Engineering Leader, Lifelong Learner, Quality Engineer at heart