Solving flaky tests related to dates — My experience

New month, new error! The tests containing dates are broken again…

Vlad Ogir
5 min readJan 19, 2024

Many of us have been faced with the “dates” problem. It's a new year or a new month and suddenly all tests fall apart!

- Vlad: Hey Engineer, our pipeline is down.
- Engineer: Hmm, it worked yesterday, very strange.
- Vlad: Ah, wait, it’s the 1st of the month…

A while back I worked on an application that had a module which computed a lot of data on the fly. This computed data was then packaged in different ways, for the end user to consume.

On the first day of each month, our pipeline went red! So, we had to stop releases to production and everyone had to work to fix the broken pipeline.

A lot of the time, we’d do a quick patch (thinking that we’d solved the problem) but the core issue was not fixed and would appear again (in a different place). So, we ended up spending a lot of time chasing our tails!

Eventually, we realised that we really needed to sit down and figure out exactly what was going on and to solve the problem systematically (properly). In the end, we needed to use 3 different attempts to fix it. In this article, I will describe each of these 3 attempts.

Key points:

  1. This dataset had date & amount fields.
  2. The module, that wrapped around the dataset, was responsible for seeding test data (we had some factories in place to make this happen).
  3. We ran computations against the different date ranges.
  4. The production dataset had millions of records and each record had many columns, beyond date and amount. So, we had lots of test scenarios to cover.
Example of how modules connected

First Attempt

We had a feature that needed to grab values for a date range going back months or years. (For example, our users would want to know data from 8 months ago.) In our code, we would grab month 8 and expect a specific number of records and a total amount adding up to X.

However, the problem was that the months were hardcoded so we had to manually adjust both the test data and assertions. This was very time-consuming (and ultimately costly to the company). So, we decided then that it would be best to make this data dynamic.

There were several scenarios we needed to consider:

  1. What happens when a user asks for data for the current month which has not concluded? (Since every day is new, the data we generate would be harder to reason with.) Fortunately, this proved to be irrelevant.
  2. At this point, we agreed to not seed data beyond the 28th of any month. (This way we wouldn’t have to deal with leap years.)

To solve this issue we updated our seeders to seed a predefined number of records for each month. We ended up with a seeder similar to below. (Our code ran through an array of data and generated a relevant amount of data.)

[
1 => [{ "day": 12, "amount": 12 },{ "day": 17, "amount": 6 }],
2 => [...],
...
27 => [...]
]

The first integer represented how many months we needed to go back and the data we should seed for that month (for example, if we are in February, 1 represents January.)

This solved the problem (at least for now)!

Second Attempt

New month, new problem! After implementing our first fix we expected outcomes from array entries 2 to 14, but the actual data was within entries 3 to 15! This happened because we moved into a new month and the tests couldn't cope with dynamic months.

To solve this new problem, we then converted the current month to an integer representation for the current month (in this case it was 2). This way, any index greater than the current month represented other years.

# Create an array of data per year
- "skip_months" = get the current month. (Feb = 2)
- within an array of seed data skip the first "skip_months"
- slice the remainder of the data into groups of 12
- the "skip_months" will represent data for the current year

# Now when I want to retrieve data seeded for April 2022
- get the current year (2024) and subtract 2022 = index in the array with data
- "month" = convert April to an integer representation
- now we can retrieve data for April

With data seeded, we then computed the expected totals for the months or years. So, once again we thought we’d fix this problem, but we were wrong!

Third Attempt (Third time lucky!)

New month, new error! This time, an end-user informed us that the computed didn’t add up for February. (By now, our users were fed up and our engineers were considering other careers!) So, we dug into the data for February and looked at how we computed results and we noticed that queries computed incorrect date ranges!

In PHP dates are tied to timestamps. So, when adding a month to 2017-01-31 you end up with 2017-03-03. This problem is not unique to PHP but many libraries in other languages have addressed this issue. (For example, JavaScript has moments.js package that resolves this problem by default.)

So, the month overflow was the issue. To fix this we had to make sure that overflow didn’t happen when performing addition or subtraction of months. (At the time we were using a package calledCarbon for date handling. This package had functionality that allowed us to enable or disable overflow globally. Additionally, we can do it on a case-by-case basis. For example, addMonthsNoOverflow function wouldn’t overflow dates.)

Conclusion

Dealing with dates is not always a straightforward task. There are various factors to consider relating to how we seed test data and language-specific limitations.

We could have tried seeding data within the test to address each specific scenario (and this might have been easier in some ways) but I believed that the creation of a single dataset for tests was the right choice (given the size of the data and edge cases). This made participation in testing a lot more accessible within a cross-functional team. Also, a single dataset was easier to reason with.

If you are facing the same challenge, my advice is to:

  • Consider having a fixed dataset to run tests against (for example, on production we have a single dataset, don’t we?).
  • Consider having helper functions in place to help manage the dynamic data.
  • Consider having language-specific nuances.

I’d love to hear your thoughts! Please share your questions and insights in the comments below.

Want more articles like this? Subscribe to my email list for exclusive content and updates on my latest work.

--

--

Vlad Ogir

Staff software engineer with passion for software delivery, architecture and design.