Embracing the flake with contract tests // Idiom Attic

“Re-run and cross your fingers” feels like the worst advice to give to a new member of your team when a test run fails on the build server, but too often it feels like the only option. As software engineers we have clung to the philosophy that flaky tests are a bad thing that should be fixed or eliminated. In a post on the Google Testing Blog titled Flaky Tests at Google and How We Mitigate Them, John Micco says:

We define a “flaky” test result as a test that exhibits both a passing and a failing result with the same code. There are many root causes why tests return flaky results, including concurrency, relying on non-deterministic or undefined behaviors, flaky third party code, infrastructure problems, etc.

Given a choice between a test that consistently either passes or fails (indicating a problem in the code) or a flaky test, I’d obviously take the former. That should always be the case for pure unit tests, those entirely free of outside dependencies. Don’t have them fail based on the time of day, the state of your database, or the current weather.

For many teams there is value in a smaller suite of third-party integration tests, or contract tests, that ensure that your code works with their code, e.g. that if you call their API in a certain way that they will understand your request and give you a certain response back. They validate that if you hold up your end of the bargain of passing in a certain input that they will provide the corresponding expected output. These dependencies can be mocked or replaced with a dummy service for some purposes, but I do get useful feedback in actually calling into the test environment of the service you integrate with to ensure that:

The service still understand the requests that you’re sending them and
The service is still returning the responses you expect

There have been multiple times when I’ve seen contract tests highlight changes being made by a third party service that were about to get pushed to production, helpfully without letting us know, and we had to inform them that they were about to break our usage of their service. Rollouts have been delayed and problems caught before breakage occurred in production, thankfully, in these instances.

That doesn’t reduce the maintenance cost of this kind of test though, which are still flakier than a buttermilk biscuit. They’re still going to turn red sporadically, annoying all the developers forced to track down the cause. To the lessen the load on everyone’s time and patience, lower the amount of effort needed to answer the question “Did we break something or did they break something?” Towards that end I offer these recommendations:

Isolate your contract tests from your unit tests, in separate folders, namespaces, or whatever other tools you have at hand.
Try to build out your code where you can easily unit test everything up to the point of sending bits over the wire. Take snapshots of your requests and assert that the content or structure doesn’t change, using separate mapping components that emit the format that the third party expects (e.g., XML, JSON, in-memory objects).
Make it easy to identify which test relates to which third party dependency and have tests that exercise each individually. You don’t want to be on a wild goose chase looking at your email provider when your payment provider is the one giving you grief. This could be in the test naming, or ideally you can break them out into separate projects in your build server so that you can easily re-run the individually.
Automate a daily scheduled run of your contract tests. When one of these tests fail on a branch where you’re making seemingly unrelated changes, it’s easy to lose a lot of time figuring out if your changes broke functionality or if it’s something in the third party’s test environment. If you can compare against recent runs against your main branch, then knowing if this is a problem related to your branch becomes much easier.

This is a decent amount of work, so I’d recommend doing this only when the third party service is a key element of the software for which you’re responsible. Don’t do this for the SMS service that 0.1% of your customers use that never changes and isn’t part of any critical workflows. If a key process breaks when the dependency changes without telling you, then contract tests can save you headaches and help you locate the source of problems faster.

(image credit: Ruth)