The Worst Test Suite—Testing Anti Patterns Experienced In Real Life

Published in

ITNEXT

5 min readDec 4, 2023

artwork by @antoinegiret on unsplash.com

I don’t think there’s an introduction needed here, I am going to tell you how the worst test suite I encountered looked like, and how it was working around it.

I warn you, this article will make your stomach turn, your skin scratch, and your engineering skills cry.

The Antipatterns I’ll cover will be ranked with 🤮s.
From puke score of 🤮 Being bad, to puke score of 🤮🤮🤮 being completely unforgivable.
All of the antipatterns are terrible, but some of them I have never seen elsewhere, and I thought ranking them will emphasize how obscured they were.

The Suite

The whole suite was about 10 test files big. 9 of them were “reasonable”, the tenth was a nightmare: It had roughly 3000 lines of code, that combined a total of ~80 test cases. About 95% of the test suite was this one file.

In addition to the tests, All the utilities the test uses were scattered around this file.

Mixing Test Types

Puke score: 🤮

The file included many tests of different types — from unit, to component, to e2e, and there were tests for which it was extremely difficult to figure out what the test type — there were unit tests who tested a specific small functionality by running the major part of the system, while in other cases there were major end-to-end tests that were executed by calling a sequence of internal capabilities of the system

Why Are The Tests Numbered?

Puke score: 🤮🤮🤮

All test names were numbered! All tests were of the form TestXX_what-the-test-checks

It was a major red flag that I didn’t fully grasp its magnitude when I onboarded the project. After about two months of work it became apparent that the tests were numbered because they were failing in any other execution order! The numbers were there for force ordering. The tests were co-dependant.

It goes without saying that I’ve learned this the hard way when I moved some tests around to make that huge file a bit cleaner.

If I wanted to add a test, it was critical that it be ordered correctly with other tests. It created an outrageous situation where I’d need to name the test correctly for it to run in the correct order. So if I want a test to run between Test34_XXX and Test34_YYY I’d had to name my test Test341_ZZZ for the lexicographical order to hold 🤯

Anonymous Tests

Puke score: 🤮

One more thing about test names — some of them were anonymous — tests that didn’t say anything about what they actually cover, tests like: test19_requirement_59_passesor the all time favorite test87_process_works.

There were tests I learned what they test only when I have introduced changes that made them fail, which forced me to do investigative work to figure out what they do.

Assertions? They are mere recommendations

Puke score: 🤮🤮🤮

Some tests didn’t end with an assertion. You are rightfully asking what a test without an assertion is doing?

In those tests there was a comment at the top of the test that was instructing the “user” to go and do something. Usually it was something like “go to the log file and check that there’s a log message of the format X — Y — Z”.

It goes without saying that this didn’t state where the log file was, and what to do if there’s more than one (due to log rotation configuration). Also — in some cases the instructions were obsolete, and the log message changed since the “test” and its instructions were written.

Those tests always passed obviously, no one told me about those during project handoff, and I discovered those when I’ve added a functionality, and by sheer chance noticed there’s a test that’s name says it tests the obsolete opposite of what I’ve implemented. It obviously passed (since it had no assertion).

I’ve deleted all those tests, and never looked back.

Complex and Obscure Input

Puke score: 🤮

The system’s test input was quite complex, and most tests were based on a single input file. It was quite obscure what it included other than “just enough input for the tests to pass”. No one really remembered how this test file was created.

To get each test to the relevant state, scattered around the 3000 lines of code, there were utility methods that manipulated the input file. None of them explained what they did, and they were usually called something like prepare_for_test78.

Anytime we needed to change the input we cried a bit 🥲

Dragged Shared State

Puke score: 🤮🤮

Not only were the tests co-dependant, the system under test itself was dragging some internal state in between tests.

Instead of recreating the system before each test, the original authors added a set of methods that enabled the test driver to nullify the internal state.

Unclear Entry Point

Puke score: 🤮🤮

There were few tests where internal functions were called in a specific order — It was literally unclear what the entry point of the test was. It turned out that all those called functions that were scattered across different classes affected some internal shared state which was eventually tested for its expected final state.

Final Thoughts

Yes, this is a true story.
Yes, it’s a bit exaggerated, but only a bit.

I don’t know how this test suite got to that rotten state that was handed off to me, but I assume only good intentions.

The project was in POC stage for a long time, and wasn’t in anyone’s focus or priority until a late stage, that can explain it, but I don’t know.

Luckily enough — the system under test was a relatively simple one. Other than the complex input, the logic itself wasn’t too difficult to understand and reason about. This made it possible to work under these conditions.
Had the system been a bit more complex — it would have been probably even worse.
That being said, maybe the fact that the system was simple enough is what enabled the test suite to rot that far?

We did our best to make things better, but we mostly tried not to do worse. Anything new developed was held to higher standards, and we decided to add as little as possible to the “one file of doom”.

This experience was an amazing school for me though — I’ve learned what not to do, and what’s the impact of doing it. It’s like getting a really bad case of Pneumonia to learn you shouldn’t run outside in a T-shirt during a freezing winter.

This post is based on a Twitter thread (published in Hebrew) that was written in the spur of the moment when I remembered this traumatic experience.

As always, thoughts and comments are welcome on Twitter at @cherkaskyb

The Worst Test Suite—Testing Anti Patterns Experienced In Real Life

The Suite

Mixing Test Types

Why Are The Tests Numbered?

Anonymous Tests

Assertions? They are mere recommendations

Complex and Obscure Input

Dragged Shared State

Unclear Entry Point

Final Thoughts

Written by Boris Cherkasky