A new enhancement has the potential to significantly expand the reach of Given-When-Then tools, and make them much more inclusive. I’m very excited to see this attempt to improve developer/tester collaboration, and I think anyone using GWT should pay close attention to this experiment.

Given-When-Then, without any doubt, won the contest for the most popular BDD example structure. The supporting tools are now well established. Because people are familiar with those utilities, they often try to use them beyond the originally intended scope. Unfortunately, this is often with bad results. That’s only logical, and in many ways a repeat of the learning curve the community went through with xUnit tools.

Twenty years ago, the developer experience and usability of unit test tools were much better than anything else out there. Many teams tried to apply xUnit tools for component and integration tests. This led to a mess of slow and clunky test suites that were difficult to maintain. Michael Feathers wrote about this problem in his famous 2005 post A Set of Unit Testing Rules. He concluded that “Tests that do these things aren’t bad. Often they are worth writing, and they can be written in a unit test harness. However, it is important to be able to separate them from true unit tests so that we can keep a set of tests that we can run fast whenever we make our changes.

For the past 10 years, I’ve seen many teams experience a similar problem with Given-When-Then tools. The developer productivity of such tools is much better than alternatives, so people often wanted to get the same benefits with other types of testing. Many teams tried to use Given-When-Then for purposes that have nothing to do with BDD. Some tried to support exploratory testing through quick context set-ups and transient test cases. Some wanted to just automate a bunch of regression tests to improve coverage. Some wrote characterisation tests before modifying a legacy system. The result is often a mess of slow, clunky test suites that are horribly difficult to maintain, not unlike the issue with unit tests two decades ago. Until now, the solution was pretty much what Michael Feathers recommended: keep the true BDD examples separate from additional tests. (In Fifty Quick Ideas to Improve your Tests, we called this “Split just-in-case tests from key examples”).

Separating “true BDD” from “fake BDD” cases helps with maintaining the core example scenarios easily, but it doesn’t fundamentally help with any of the additional roles. In many situations, the attitude towards using GWT tools for exploratory or regression testing was “You’re allowed to play with our toys, as long as you do not make a mess”. This often meant that the additional scenarios were poorly maintained. They often just got deleted instead of updated when things change. The reason for that was simple — until now, applying GWT scenarios for more traditional testing roles involved a lot of copy-pasting and redundant information, or hacking the tooling to optimise test writing at the expense of getting clear results. SpecFlow’s new plugin, called ExternalData, is about to change all that.

Bringing external data into scenarios

The ExternalData plugin for SpecFlow can read test data into scenarios from an external file. Instead of copying the same currency data to all the scenarios, we can just ask the tool to run the scenario with different parameters:

@property:currency=Supported_Currencies
Scenario: same-currency payments do not trigger forex
  Given a payment of 100 <currency> 
  And a related account in <currency>
  When the payment is received
  Then the forex rate should be 1.00

Notice the @property parameter in the header. This is the additional bit of syntax, telling SpecFlow that the configured parameter will not be in the scenario itself, but in an external configuration file. We can use the same property on hundreds of related scenarios, and they will all be configured in the same way.

Another way to use this plugin is to showcase some important examples (those key for understanding the flow), then just mix in additional external examples into the same scenario. To do this, add both the @property tag and a section with Examples into an outline:

@property:<email>=Emails.Valid
Scenario: register with valid email
  Given a visitor with <email>
  When the visitor completes registration
  Then a welcome message will be sent to <email>

Examples:
 | email                | description        |
 | test@test.com        | basic email format |
 | test+label@gmail.com | gmail labels       | 

To make this even more interesting, ExternalData supports the BugMagnet database format out of the box, so you can just mix in all the weird edge cases for emails, names, and many other data formats to check them quickly.

Why this matters so much

A while ago I worked with a large hedge fund where the business users insisted that all the key acceptance tests run for each individual supported currency. Most of the expectations were the same for all supported currencies, and most of the feature code was the same. From a collaborative analysis perspective, the distinction between currencies was not so important. However, there were a few cases when bugs escaped through testing because people didn’t check some unusual currency pair combination. From a business risk perspective, it was sensible to just re-run all the tests multiple times. Most of the testing infrastructure was already built to support specification by example, so it was logical to try reusing it. Because the tooling did not provide a systematic way to loop through scenarios, we had to hack together a pre-processor and provide data through environment variables. This made the test results difficult to understand.

As a consultant visiting teams, I’ve seen a similar situation many times. People hacked together different ways of pre-processing or aggregating the scenarios. In essence, the choice was always between messy tests or messy results. The tools just did not expose any way to customise flows deep enough. ExternalData makes this possible in a systematic way. It does not require a trade-off between scenario maintenance and readable results.

Until now, the scenario and the related data had to be in the same file — which is great for true BDD examples, as we can use the test cases to clarify the scenario. But from a purely regression testing perspective, being able to swap test data and keep the scenario makes it much easier to manage test coverage. Here are some of the flows that can work better with data outside the scenarios:

Support exploratory testing: by being able to quickly play just with the data and re-run a bunch of related scenarios, testers can easily experiment with different cases, explore the implementation and discover important boundaries. They can do this without copying and pasting scenario structure. Note that this was, in a smaller sense, possible with scenario outlines already, but it only applied to a single scenario. With external data, testers can modify dozens of related scenarios just by tweaking a single file. Testers can now experiment without ever modifying the Given-When-Then files, but still reuse the whole automation infrastructure. They can also preserve the data configuration for future test runs without creating a poor-cousin example file, which won’t be maintained. When the scenario structure gets updated for primary examples, testers can use it immediately with their data configuration files without

Improve test coverage with additional examples: Loading external data in a systematic way makes it easy to split key from non-key test cases, without the need to duplicate scenarios or introduce a separate scenario file hierarchy. Teams no longer need to create messy long tables that make Given-When-Then scenarios difficult to read or understand. Because the test runner itself now supports looping through sample data, the results are consistent and integrated, so it’s easy to track progress and discover regression problems. By being able to swap a configuration file for a test run without modifying the scenarios, we can run the same suite for different environments using different data. This makes it possible to use a quick development test run with only minimal examples and a more complete but slower execution on the testing system. It also means that people can easily use different identifiers or data for different environments. For example, we can run an extensive test using different test payment methods on a pre-production environment, then run a small post-deployment test with an actual payment for a nominal amount in production, using the same scenarios.

Help teams without testing experts discover common problems: By making it easy to load data from BugMagnet, and perhaps similar databases in the future, this plugin makes it easy for teams that do not have testing specialists to at least do a bit more exploration than usual. Of course, synthetic test data won’t replace an actual human who knows how to test, but having something always remind you of typical problematic data, and making it easy to try out those cases, will definitely help to reduce common problems. If this approach becomes popular, I expect to see other public/open-source databases of common test cases, which can then easily be loaded into Given-When-Then tools.

Consume technical test data without making a mess: Hiding complex technical data is a common issue with Given-When-Then scenarios. Common examples are XML messages, JSON API call data and complex tabular database structures. With true BDD scenarios, the proper solution to these problems is often to include only the key attributes into the examples, and then generate the rest in the step implementations. This works well for a limited set of data, but it does not allow testers to explore changing the non-key attributes, and it tightly couples synthetic data generators with the execution. A standardised systematic interface for external data makes it easy to decouple the two, which is an important trick for managing test suites with complex data (In Fifty Quick Ideas to Improve your Tests, we called this “Split data generators from tests”).

Share standard test cases with other teams: Large organisations often have teams that work on related parts of a system, so some edge case that causes problems for one module is very likely to cause problems somewhere else. For example, when working on end-of-day business reports, examples with leap year, leap second and daylight saving changes are usually good to run on anything involving date inputs. Ensuring that every team consistently checks for this is a challenge. External data loaders make it possible to create a single version-controlled configuration file, which teams can just pull and integrate into their build. This makes it easy to ensure that every team consistently checks for important cases.

Support approval testing: BDD doesn’t work well for situations where the expected outcome is difficult to describe upfront. Sometimes it’s only possible to create a low-fidelity expectation, and sometimes people just don’t know what to expect. For situations like that, approval testing is much better than specification by example. Separating data from the scenario makes it possible to support approval testing flows with Given-When-Then tools. A test run can capture the actual outcome in a format that can be easily approved as a baseline and fed back into the next test execution.

It will be very interesting to see how this experiment evolves, and whether other tools start supporting a similar syntax. The SpecFlow team is actively looking for feedback on the experiment, so if any of this sounds interesting, check it out at specflow.org.