Let’s talk TDD. Take a look at these two sentences: "write a failing test" and "refactor in the safety of your tests". They do make sense, right? Nothing seems odd, right? No wonder, as these two important and memorable steps are crucial to TDD, at least to the "strict" flavour typically used when teaching or practicing. Yet these very sentences also hide a flaw in the argumentation, open a gap for failure – bugs – to seep in and demonstrate the fallibility of our reasoning. No, really.
So what’s the problem? It’s best demonstrated by the corollary of "write a failing test" which is "you aren’t supposed to write a green test". This act of skipping (functionally) valid test cases underscores that we are to pick tests that "drive the design". This looks fine on the surface, at least until we get to the refactoring step, which is supposed to take place "while in the green", in the "safety of our tests". Which is where we may fall through the cracks of our own – flawed – reasoning.
Having omitted to add test coverage to relevant functionality, during refactoring we may break any of those unwritten contracts we now take for granted, without noticing it. Refactoring, that often also entails restructuring, warrants the equivalence of "before" and "after" by means of tests. No tests for a functionality means no guarantees. And the implicit, or accidental, "green" status of such a scenario may change into "red" anytime, lacking the explicit safeguards of an actual test case.
In other words, our trusted "safety net" may have holes in it. This being a text-only post, you can simply picture a diagram consisting of three concentric sets, each fully embedding the next. With the outer set standing for all potential outcomes of the program and the inner set for the functionality covered by tests, the one in the middle is the zone of "accidental" or "implicit" correctness. And the problem is that this scope is subject to change during the refactoring stage.
Assessing our initial argumentation more attentively, we can now see that the flaw lies in subordinating the functional aspect of tests to their contributions to the design process – whereby the latter is supposed to be a side-benefit only and as such ought to be of secondary importance. Correctness can only be ensured via functional coverage, so given their different objective, tests written for merely driving the design are of dubious value as a "safety net".
Note, that addressing the problem later, by turning our TDD test cases into a regression capable suite with dependable functional coverage, is a partial remedy only. This approach may result in filling the gaps from before but will still leave us with potentially problematic refactoring stages and inherently inconsistent intermediate mental models, where the scope of our assumed correctness does not match the actual – and warranted – correctness of the program.
How to patch TDD then? Fortunately we have a simple fix at hand: instead of "write a failing test" we can go with "write the next test, repeat if green". In other words, do as before, just don’t omit tests. This way the primacy of tests as guarantors of functional integrity is upheld and so they can serve as a genuine "safety net". And that’s all.
PS
Closing this post let’s widen the picture via the rhetoric question: how is it possible that in all the years nobody noticed this glaring flaw? This is the scariest to me, how a favourable outcome of an argument may make even the brightest minds blind to logical errors. This example reveals the susceptibility of our thinking to fallacies hiding in plain sight, even in an almost mathematical domain. And that’s the most concerning moral of this story to me.