Development

Five Factor Testing

Sarah Mei — Tandem Alum

When I took my first real dev job in the late 90s, it was not common for developers to write their own automated tests. Instead, large companies depended on teams of testers, who tested manually, or were experts in complex (and expensive) automation software. Small companies were more likely to depend on code review, months of “integration” after the “development,”…or most commonly: pure hope.

But times have changed. Today, on most teams, writing automated tests is a normal part of the software developer’s job. Changes to a codebase usually aren’t considered complete until there are at least some automated tests exercising them — usually written by the developer who changed the code. This has freed up the dedicated testers, in companies that have them, to focus on more valuable activities such as exploratory testing.

Like most shifts, this one was imperceptible while it was happening, and obvious in hindsight. In the late 90s, when having any developer-written tests was weird, it was hard to imagine that twenty years later, having no developer-written tests would be weird. But here we are.

Welcome to the Future

Flying cars! Personal jetpacks! Hoverboards! Writing tests has gone from “novel activity” to “just another thing we do to support our code changes,” akin to meetings or email or keeping up with Slack!

Sigh.

But like meetings and email and Slack, sometimes we write tests just to do it, whether or not it’s actually useful. After that’s been going on for a while, you’ll hear things like this:

“The tests are so [slow | flaky | unpredictable].”
“We don’t have the test coverage we need, but we don’t have time to update it.”
“Writing tests just doubles the amount of time a story takes me — for no useful reason.”
“That story is finished. I just have to write the tests.”

To fix these issues and end up with tests (and a testing process) that actually work for us, we need to reconnect with the underlying needs that originally drove us to write tests. Surprisingly, they aren’t really written down anywhere. Maybe it’s just assumed we know what they are, but I listed them out enough times for people to think I should just write a blog post and send them that. So here you go.

The Five Factors

There are five practical reasons that we write tests. Whether we realize it or not, our personal testing philosophy is based on how we judge the relative importance of these reasons. Many people think about factors 1 and 2 — they are the standard reasons for writing tests, and are often talked about. But the arguments we have about testing, both within our teams and endlessly on the internet, often stem from unarticulated differences in how we think about factors 3, 4, and 5.

First we’ll examine each factor in isolation, and then, moving into some concrete examples, we’ll consider how they combine when you’re deciding how to test your code.

Good tests can…

1. Verify the code is working correctly

2. Prevent future regressions

3. Document the code’s behavior

4. Provide design guidance

5. Support refactoring

Let’s look at each of these in more detail.

1. Verify the code is working correctly

In the most immediate sense, most of us write tests to have confidence that the code we’re adding or changing works the way we think it does. In college, back in ye olden dayes, I wrote little shell scripts to exercise my coding assignments. I never turned the scripts in, because at that point, verification was the only goal of my tests. After all, my computer science professors only cared whether the assignment’s output was what they wanted.

2. Prevent future regressions

Immediate verification is sufficient for small coding assignments, but most of us work in larger, more complicated codebases, in which other people are also working.

In this situation, the automated tests you write for your code become part of a “suite,” or collection, of tests that all verify different parts of the system. Making a change, and then running the suite and seeing all the tests pass, gives you confidence that your change didn’t break anything anywhere else in the application. This prevents “regressions,” a fancy word meaning “things that used to work, but don’t anymore.”

Our tests become part of the suite, so that in the future, other developers can have confidence that they didn’t accidentally break our stuff.

3. Document the code’s behavior

“Programs must be written for people to read, and only incidentally for machines to execute.” – Hal Abelson

Code is communication — primarily to other developers; secondarily to a computer. Since your automated tests are code, they’re also communication, and you can take that further by explicitly designing them to be external documentation for the code they test.

There are, of course, many ways to document your intent when writing code:

long form, such as on a wiki or in a README
comments in your code
names of program elements like variables, functions, and classes

Tests are often overlooked as a form of documentation, but they can be more useful than any of the above to a fellow developer. For one, they’re executable — so they don’t go out of date. In addition, though, they’re usually the easiest way to demonstrate (rather than explain in prose) things like how you’re expecting the code to be used, what happens when it encounters an edge case, and why that weird-looking bit is like that.

4. Provide design guidance

Unquestionably, the most controversial claim of testing advocates is that testing leads to better software design. Most of the explanations I’ve seen for this are grounded in software design theory, which can be difficult to translate into what to do when you sit down at your editor. Other sources don’t even really try to explain. Instead, they ask you to take it on faith: “write tests, and over time, your code will be better than if you didn’t write tests!”

This lines up with my experience, but I’m not asking you for faith. This idea, which I first heard from a colleague when I was at Pivotal Labs, was a lightbulb moment for me in starting to grasp why this works:

Designing the interface for a piece of code is a delicate tightrope walk between specificity (solving the problem you have right now) and generality (solving a more general class of problems, with a eye to reusing the code elsewhere). Specific code is usually simpler in the now, but harder to evolve later. Generalized code usually adds some complexity now that is not strictly required to solve the current problem, in return for being easier to evolve later.

Learning to pick the right place on that spectrum for the piece of code you’re working on is a messy, squishy, difficult-to-acquire skill. However, your tests can actually help you, in a very concrete way.

Let’s say you’re adding a method to a class. Presumably you’re doing that because you want to call that method somewhere else. Here’s your new method, which enqueues a background job to send an email to the user.

class User
  # ... other stuff ...
  def send_password_reset_email
    email = UserEmails.password_reset.new(primary_email, full_name)
    BackgroundJobs.enqueue(email, :send)
  end
  # ... more stuff ...
end

The code where you’re using it is the new method’s primary client. Here’s the primary client — a method that is called as the result of an API call, when the user requests a password reset from the client.

class PasswordResetController
  # ... other stuff ...
  def create
    Auditor.record_reset_request(current_user)
    current_user.send_password_reset_email # this line is new
  end
  # ... more stuff ...
end

When you write a test that calls that method, you’re giving it a secondary client, in which the method is used in a different context. Here’s the unit test for your new method.

describe User do
  # ... other tests ...
  describe("#send_password_reset_email") do
    it("enqueues a job") do
      expect(BackgroundJobs).to_receive(:enqueue)
      User.new.send_password_reset_email
    end
  end
  # ... more tests ...
end

Using code in two contexts by writing a test for it means you build in a tiny bit of generality beyond what you specifically need in your primary client. It may be almost imperceptible, but importantly, it’s not speculative — in other words, you don’t risk going too far and building in generality that obscures meaning and that you probably won’t use.

Over time, this technique loosens your code’s ties to the specific problems it solves, and makes it more possible to evolve the codebase in the direction your team needs it to go.

5. Support refactoring

The only constant in software is change, so we often want to write code that will be straightforward to evolve once new requirements come in. Refactoring is the process of cleaning up and changing the organization of code, without changing its external functionality. When you’re refactoring, you need tests to ensure you’re not breaking anything by moving code around.

A codebase that needs to absorb changes over the long term must have a test suite that supports refactoring, or the rate of development (even as developers are added) will inexorably decrease. So you need automated tests at different levels of your codebase (so you can refactor beneath different interfaces) that you can use to assert that functionality hasn’t changed.

How To Use This List

Ok! We’ve got our list of factors. Now it’s just a matter of maximizing them all, right?

Well…no. That’s impossible. It generally won’t even be useful to “pick a favorite” and always optimize for that factor. Which factors are more important will vary between sections of your codebase, and even in the same section over time. This isn’t a to-do list; it’s a framework for discussing test strategy. When you’re looking at a test in a pull request or during a code review, think about which of the factors it supports, and which it doesn’t. Then you can discuss the test in terms of those factors — are they the right ones? Would it make sense to optimize more for documentation here, rather than future refactoring?

Our traditional ways of discussing tests are largely based on morality and shame, and are not very useful. For example, “you need to write an integration test because it’s an industry best practice” is a morality argument. The subtext is that everyone else does it, so it must have inherent value, and you’re a terrible developer if you don’t want to.

No test has inherent value. A test is only valuable to your project insofar as it supports one or more of the five factors.

And keep in mind that an individual test or even a suite, overall, cannot fully support all five factors. They are necessarily somewhat at odds. Here are a few examples of how the factors combine to show what I mean.

Ex 1: Unit Tests and Refactoring

“The answer to every question in software is ‘it depends.'” – Sandi Metz

Comprehensive unit tests for a class are great for 3. developer documentation, but make it hard to 5. refactor that interface. To do that, you need a set of tests written one level up (where your class is used) that assert on outcomes, so you can make sure the functionality doesn’t change, even as you rename methods and move pieces around. 2. Regression-oriented unit tests are often less comprehensive and thus easier to work with when refactoring, but they may not sufficiently document.

But like Sandi says — it depends. If your class is part of a public API and will change only rarely, comprehensive 3. developer documentation with a narrative structure (i.e., meant to be read through) may actually be your primary objective. Since the interface won’t change much, the fact that such tests complicate refactoring is not very important.

On the other hand, if it’s an internal class, you may opt instead for less-comprehensive 2. regression-oriented unit tests that hit the happy path plus likely errors, and worry less about narrative organization. These support 5. refactoring better than more documentation-oriented tests, and can still serve as 4. design guidance and lightweight 3. developer documentation, even as the emphasis is placed elsewhere.

You’re never picking one factor. You’re always striking a balance between them. And it can be hard to identify precisely the balance you’re striking, particularly if you have some experience writing tests. It’s worth doing, though, because you’ll find tests that are unconsciously optimizing for a factor that’s important somewhere else, but not right here. You can often simplify (or even eliminate) those tests as a result.

This discussion about unit tests is a perfect example. I’ve seen systems with versioned public APIs that had (appropriate) documentation-oriented tests. But without thinking about it, they applied that test strategy to similar internal structures, forcing devs to write comprehensive unit tests for everything. This made refactoring really difficult — even internally, where it was supposedly allowed. Once they thought about this in terms of factors, though, they were able to see that they needed to tilt their strategy for the internal structures back towards refactoring support.

Ex 2: Integration Tests, Regressions, and Documentation

As the running time of a test suite gets longer, its utility for 2. preventing regressions goes down, because developers are less likely to run a long test suite. My personal threshold is somewhere around ten minutes — more than that, and I start looking for ways to speed up the suite.

Top-level integration tests, which are great for 1. proving your code works while you’re developing a feature, and 3. documenting how your app works afterwards, run quite slowly. They often make up the bulk of time when running a test suite. Once the feature has been written, and the need has shifted from 1. proving it works to 2. preventing regressions, you can often rewrite slow integration tests in a faster form. For example, an integration test for a web application that uses JavaScript functionality can often be rewritten as a combination of individual backend endpoint tests, and JavaScript unit tests. You have to be careful to be sure you’re getting the same coverage, but if test suite running time is important, it’s doable.

Even if you have exactly the same coverage, there’s still a downside to doing this conversion: if you don’t have top-level integration tests, it can be hard to figure out exactly how a feature is supposed to work just by looking at the tests. Your test suite’s utility for 3. documenting functionality has gone down, because now instead of just looking in one place, you often need to piece it together from several places.

At this point, you could make the deliberate decision to start documenting that top level functionality in another form — screencasts, on a wiki, etc. — if it feels like the value of lowering the suite-wide run time outweighs the value of documenting functionality via integration tests.

This Is Complicated o_0

Yup. Testing, itself, is complicated, because tests are techno-social constructs that support both the code and the team that works on it. As your team’s needs change over time — because of business changes, personnel changes, or (most commonly) both — the types of tests you need change alongside. Treat your tests as living documents, rather than the fossilized remnants of past sprints. Consider their actual utility to you, right now, rather than whether a book, or a thought leader, or even your boss says you “must” have them.

Tandem is custom software development company in Chicago with practice areas in web application and custom mobile development.

Sarah Mei