Treating Tests as First Class Code — Kevin Taylor
pdf-download epub-downloadOn a recent project, I was pair programming with a talented and experienced programmer. Talented and Experienced was relatively new to the project and this was my first opportunity to pair with him. When I was in the driver’s seat, I noticed that Talented and Experienced was fidgeting and his obvious discomfort was growing as I continued typing. In the test that we were working on, I spotted a variable misleadingly named “tax.” I opened Eclipse’s rename dialog box and typed in “expectedTaxRate.” Talented and Experienced exploded, “It is just a test! Let’s spend our time on the real code.”
A few weeks later, I attended an OpenSpace discussion session entitled “Are Tests Software?” Didn’t all agile programmers consider unit tests an integral part of their software? In fact, I would argue that on an agile development team with comprehensive unit tests in place, tests must be treated with more care than the functional code they protect. When a team has a robust and flexible test harness around its functional code base, the team is liberated to refactor that code with confidence, knowing the tests will complain if the functional code breaks.
But, what gives confidence to developers when they refactor or modify the tests themselves? Since writing unit tests against unit tests would lead nowhere, instead, great care must be taken when tests are refactored or updated to handle new or changing functionality. In this article, we’ll review current views of what software quality consists of and how these characteristics are reflected in unit tests. Then, we’ll discuss specific things you can do to improve the quality of your unit tests.
Software Quality
Software quality is an elusive concept. Is software quality a measurement of how closely software fulfills its specification? Is it how well the software meets end user needs? Or, should software quality be defined as how well it is designed and written, i.e. how readable and maintainable the source code is?
Software quality becomes even more difficult to define when we consider the imbalance in skill and experience amongst different software teams and the varying external pressures different teams cope with. What is considered acceptable code in one shop may be considered defect-ridden spaghetti code in another. The first shop may be primarily concerned with quickly delivering adequately correct Web applications and may have a high tolerance for defects. The other shop may be working on safety-sensitive systems, so would certainly be much less tolerant of defects. However, they may still have very low quality standards regarding code design and maintainability.
Unit Testing Tips - Summary
- setUp() is for setting up
- Use one TestCase per fixture scenario
- Write reusable fixture logic
- Keep assertions simple
- Use meaningful identifiers
- Pair program
- Test your tests
- Run > 100 tests per second
- A test must never affect another test’s outcome
- Ensure test failures are easy to debug
- Use stubs to temporarily help test-drive new code
- Use fakes liberally to isolate tests
- Use mocks sparingly to assert interactions
ISO 9126 Standard
ISO 9126 is an attempt to standardize the definition of software quality. According to ISO 9126, software can be evaluated against each of six quality characteristics.
Functionality
The functionality of software refers to what software does rather than how it does it. For unit tests, this reflects how accurately and completely the tests measure the correctness of the functional code. Do the unit tests test all the scenarios and execution paths? How thoroughly do they assert the expected behavior of the functional code?
Reliability
The reliability of software is associated with its capability to maintain its specified level of performance under specified conditions. Maturity, fault tolerance, and recoverability are the three elements of reliability. Unit test reliability is primarily a measure of its maturity, i.e. the presence or absence of logic and runtime errors.
Usability
The usability of software consists of three sub-characteristics. The first, understandability, is concerned with how easy it is to determine the purpose of software and whether the software is applicable to our needs. In other words, should we use it? The next is concerned with how learnable software is. Finally, operability is a measure of how easy is it to actually use software: What is the level of effort required? Usability of unit tests is of paramount importance to developers. Being able to quickly find the appropriate unit test, understand what scenario is being tested, and modify the unit test is at the heart of test quality.
Efficiency
The efficiency of software can be evaluated by considering how fast it is (CPU time and I/O throughput rates) and how resource intensive it is (memory, CPUs, socket connections). Unit test efficiency manifests itself by how fast the test runs and how isolated it is from external resources such as file systems and network connections.
Portability
The portability of software refers to how well it operates in different environments, such as different operating systems. A unit test’s portability is primarily concerned with how well it runs on all targeted operating systems and how well it runs in different execution environments, such as via an IDE or a continuous integration tool such as CruiseControl or AntHill.
Maintainability
Software maintainability characterizes the design and clarity of the software’s source code. Maintainability directly affects developers who must analyze and modify software. Indirectly, maintainability affects the owners and users of the software by influencing the costs involved in enhancing the software. Since unit tests are used by developers in source code format, maintainability of unit tests is logically equivalent to the usability characteristic of unit tests that we already looked at.
Treat Your Tests Well
According to Kent Beck, Ron Jeffries coined the phrase “Clean code that works” to describe the goal of test-driven development (TDD). Jeffries’ concise description of quality code is not only applicable to your functional code base. Let’s see how it can help guide us toward higher quality test code. Turning your unit tests into clean code that works requires keeping them simple and intentional. But, according to Jeffries, having clean code is only half the solution: code must also work. For unit tests, this means they must be correct, sufficiently complete, fast enough, independent, and isolated.
Simple
Simple unit tests keep fixture setup as simple as possible by only setting up a single set of closely related fixtures per test case. Use the setUp method of the test case and never use conditionals to get clever with the set up. Keep it simple. When you feel the need to add a conditional statement while setting up your fixtures, instead create a new test case with its own setUp method and fixtures.
Use an ObjectMother or Builder to remove complex, duplicate fixture code from your tests. This will make the code easier to understand and reduce the chance of errors.
Simple unit tests keep assertions and expectations as simple as possible while proving that the functional code is correct. For example, if asserting a value is null, use Assert.assertNull(value) instead of Assert.assertEquals(null, value).
Simple unit tests don’t contain equivalent duplication. Equivalent duplication is duplication that is not coincidental. For instance, if a test is expecting 10 line items on an order and also a quantity of 10 widgets, the 10 is coincidental. It should be represented by two different identifiers. However, if 5:00 P.M. is asserted as the expected order cutoff time in multiple tests, 5:00 P.M. is equivalent duplication and should be represented by a single identifier.
A common source of duplication in unit tests is caused by overuse of JUnit’s built-in assert methods. Use the extract method refactoring to pull reusable assertion logic into custom methods. You should have plenty of custom asserts in your unit tests, such as assertCollectionEquals, assertDateBefore, assertBeforeOrderCutoff(Date), etc.
Intentional
Along with being simple, clean unit tests must communicate to the reader what is important for her to know about the tests. Intentional unit tests are those that have scenarios that are easy to identify, have fixtures and assertions that are easy to understand, and clearly document the expected behavior of the functional software.
To make unit tests intentional, use the same techniques we have all become accustomed to applying to functional code. Use identifiers that are clear and meaningful. Use refactorings such as extract method, to document business logic. Use comments sparingly. Use your team’s coding standards and common accepted idioms for the language that you are working in.
The code below shows a test doing too much.
import junit.framework.TestCase;
public class HelloWorldTest
extends TestCase {
protected void setUp()
throws Exception {
super.setUp();
}
public void test_sayIt() {
Person person = null;
HelloWorld helloWorld = new HelloWorld(person);
assertTrue(
"Hello!".equals(
helloWorld.sayIt()));
assertTrue(person == helloWorld.getPerson());
}
public void test_sayIt_withName() {
Person person = new Person();
person.setName("Kevin");
HelloWorld helloWorld = new HelloWorld(person);
helloWorld = new HelloWorld(person);
assertTrue(
"Hello! Kevin is 0".equals(
helloWorld.sayIt()));
assertTrue(person == helloWorld.getPerson());
}
public void test_sayIt_withNameAndAge() {
Person person = new Person();
person.setName("Kevin");
person.setAge(30);
HelloWorld helloWorld = new HelloWorld(person);
assertTrue(
"Hello! Kevin is 30".equals(
helloWorld.sayIt()));
assertTrue(person == helloWorld.getPerson());
}
}
Listing 1.
Listing 1 has a TestCase with all the fixture set up code within the test methods. This TestCase contains three different scenarios. Each scenario is set up and asserted in a different test method. You’ll notice there is a fair bit of duplication between test methods. Also, the test methods are difficult to follow because there is so much fixture code to wade through (not really, in this trivial example, but use your imagination).
Listing 1 is also using assertTrue() for every assertion. This further decreases readability.
import junit.framework.TestCase;
public class HelloWorldTest_withNullPerson
extends TestCase {
private Person person;
private HelloWorld helloWorld;
protected void setUp()
throws Exception {
super.setUp();
person = null;
helloWorld = new HelloWorld(person);
}
public void test_sayIt() {
assertEquals(
"Hello!", helloWorld.sayIt());
}
public void test_person() {
assertNull(helloWorld.getPerson());
}
}
import junit.framework.TestCase;
public class HelloWorldTest_withNameOnly
extends TestCase {
private Person person;
private HelloWorld helloWorld;
protected void setUp()
throws Exception {
super.setUp();
person = new Person();
person.setName("Kevin");
helloWorld = new HelloWorld(person);
}
public void test_sayIt() {
assertEquals(
"Hello!, Kevin", helloWorld.sayIt());
}
public void test_person() {
assertSame(person, helloWorld.getPerson());
}
}
import junit.framework.TestCase;
public class HelloWorldTest_withNameAndAge
extends TestCase {
private Person person;
private HelloWorld helloWorld;
protected void setUp()
throws Exception {
super.setUp();
person = new Person();
person.setName("Kevin");
helloWorld = new HelloWorld(person);
}
public void test_sayIt() {
assertEquals(
"Hello!, Kevin", helloWorld.sayIt());
}
public void test_person() {
assertSame(person, helloWorld.getPerson());
}
}
Listing 2 - now showing three test cases.
Listing 2 contains cleaned up versions of the tests. Since each method represented a different scenario, I moved each scenario to its own TestCase and moved the fixture code to setUp().
To assert HelloWorld behaviors, I dumped all the assertTrue() methods and replaced them with more specific assertions, including assertNull() and assertEquals().
There is room for improvement in listing 2, though. Notice the duplication between the setUp() methods. (Again, this may not be obvious in this trivial example, but imagine that Person took 10 lines of code to set up.) This can be improved by extracting the set up of Person into a reusable Builder or ObjectMother. I will leave this as an exercise.
Correct
Of obvious importance, unit tests must be correct. This is not always easy to achieve. Before TDD, a developer had to devote his attention to ensuring that his functional code was correct. Now he has unit tests to give him positive feedback that his functional code is correct (or not correct!). No such luck with unit tests: Good, old-fashioned logic must be relied upon.
Don’t rely only on your own über-programming skills. Whenever possible use pair programming. It provides an effective safety net when working on unit tests. Likewise, always remember to wear two hats when programming: a coding hat and a refactoring hat. Code when you have a red bar. Refactor when you have a green bar. Don’t mix the two.
Sufficiently Complete
In addition to being correct, unit tests must also be sufficiently complete. Very few code bases have 100% test coverage and each team must determine their target coverage level. Use code coverage tools such as Emma, Coverlipse, and Jester. They can help you measure how much of your code base is covered by tests, find those gaps in coverage, and evaluate the semantic quality of your unit tests (how well the assertions are written).
Fast Enough
Finally, in addition to being correct and sufficiently complete, unit tests should run fast enough to be convenient. Fast tests encourage developers to run the entire test suite frequently throughout the day. Fast enough is subjective, but a good rule of thumb is that 100 unit tests should run in less than one second (much faster, if possible) In a current project of mine, our team has 2800 unit tests that run in 15 seconds.
That is almost all there is to turning your unit tests into “clean code that works.” Two additional qualities are specific to unit test code, though: independence and isolation.
Independant
Firstly, unit tests must run independently of other tests. This ensures that one test’s side effects will not affect the outcome of another test. This usually occurs when fixtures are not properly torn down between test runs. Bad fixtures could manifest themselves as external resources that retain some state or static class variables that are not reset between tests.
Isolated
Lastly, unit tests should exercise a specific cohesive unit of your functional code base. Isolating the unit you are testing has two advantages. Most importantly, isolating the code you are testing makes it easy to figure out why a test is failing. If you are debugging into multiple levels of an object graph or call stack trying to figure out why a test is failing, you need to further isolate the unit of code being tested from its collaborating objects. This is where stubs, fakes, or mocks come in handy. (Beware over-mocking, though, which can make tests brittle, i.e. cause internal refactorings to break tests.)
In addition to making a test failure easier to track down, isolated units take less fixture set up. This makes the tests easier to read and digest, as you don’t have to understand as much set up logic to use the tests.
Conclusion
As teams move along the continuum from no test coverage to comprehensive test coverage, the value of their test suites increase. How valuable the tests ultimately become depends on two factors: How well the tests document the behavior of the system; and, how much flexibility the tests provide for the team when refactoring existing logic. To maximize the value of your team’s test suites, treat the tests with the same care and consideration that you treat functional code.