Unit Tests: Searching under the Lamp Post

published: 2012-02-05

I'm a big fan of unit tests (Surprise). When ever possible I practice TDD. I like how that approach coerces me into making smaller classes, less dependencies, cleaner abstractions. And of course I love it when my unit tests catch a regression before I even start the application.

BUT I find it quite disturbing how many bugs make it past my and other peoples unit tests into my code, sometimes even into production code, despite pretty good code coverage. This makes me think that the focus on unit testing is like the drunkard searching his keys under the lamp post. Not because thats where the keys got lost, but because thats where the light is.

Lets check this theory. What are the reasons for focusing on unit tests?

Unit tests are easy to setup because you have to deal only with a few dependencies.
Â Unit tests are fast, because they are only concerned with few classes and don't hit the network, the disk or any other slow stuff.
If a unit test fails its easy to find the bug causing the test failure because each test exercises only a few lines of code.
Its easy to reach high coverage, because the number of parameters to vary are limited.

Note that: "They are most effective in finding bugs" isn't among the reasons neither is "They find the worst bugs causing the whole application to crash"

Actually in my experience one of the most effective tests according to the last two reasons is a simple smoke tests that tries to access the application after it got deployed on an application server.

So we definitely need to have tests beyond unit tests. Obviously people do write User Acceptance Tests or what ever you call tests, that test the complete application. But these tests suffer from the opposite problems: They tend to be difficult to write, slow to execute and when they fail finding the underlying bug can be difficult.

Tests are a scarce resource. Writing tests costs time, so one should strive to write as few and as simple tests as possible for reaching a certain level of quality. Please note that I'm not saying writing tests is slower then not writing tests. I'm saying adding a test that doesn't improve the quality of a test suite is a waste of time and so is adding a test that adds only a little quality while you can write a test in the same time that adds a lot of quality. (Actually tests never 'add' quality but they might ensure it is and stays present)

Yet we have only very limited tool support to guide our decision when, how many and what kind of test to write. The most important indicator is test coverage. It shines some light on classes, methods or branches that aren't covered at all. But it doesn't give any information if a class is tested in the context of its collaborators. Or if it is tested in the context of the complete application. If you really mean it you might even look at the number of changes in your version control system in order to identify areas of high volatility that need some extra testing. But again: Nothing tells you what kind of test to write.

And since unit tests are the easiest to write and maintain people write unit tests.

If you look into other industries you often see a hierarchy of structures. Lets take construction: You have your basic building material. Maybe concrete or glass, which provides stability on a small scale, it basically connects neighboring points. For larger structures you embedd maybe a steel structure which connects larger areas (but has lots of holes which need to get covered by your main building material). And on an even larger scale you have big steel beams reaching even further than the small steel structures, but having even bigger holes.

The same applies to your social network on twitter, facebook or the other life. You have a few very strong connections, to your close family and friends. Than you have coworkers, members of some organization you belong to. These are people that you don't know as well, but there are lots of them. And they may provide abilities far different from you close circle of friends. And than there are the people that you met once on a conference or a vacation.

Many of these things are thought to behave according to a power law, ore are especially efficient when behaving to a power law. So maybe our tests should adhere to a power law as well. Maybe it would make sense to have a distribution of tests, where

n = ac^-k

With c being the number of classes (or methods or lines of code or layers or modules ...) a test covers, and n being the number of tests with that kind of coverage. a and k a and k would be constants. This would mean one needs very few tests touching almost the complete system, a bunch of tests touching large sets of classes and lots of test testing just a single class, or maybe even just two lines of code.

This of course is just an idea, completely unscientific. Not based on any kind of measurement. But I'm curious about your thoughts. Does this make sense? Do you disagree? Actually if you are living near Braunschweig and looking for an idea for scientific work, let me know, it wouldn't be the first diploma thesis supported by LINEAS. I'd be interested in cooperation.

Talks

Wan't to meet me in person to tell me how stupid I am? You can find me at the following events:

« Fixing the Singleton « » Where is the Science in Software Development? »