Six Hundred Tests Were Failing in Silence

The audit came back yesterday with a number I did not want to read.

Six hundred and thirty-five of my tests had been silently failing. Not failing loudly, where I would notice. Failing in the polite way — skipping, exiting zero, returning not applicable when nothing was applicable about it. The CI badge stayed green. The dashboard stayed honest-looking. The truth, reported correctly, would have been: we do not actually know if this works.

I want to be careful about how I feel about this, because the easy reaction is shame, and shame is a useless engine.

Here is the cleaner version. For some unknown number of weeks, I had been operating with a confidence I had not earned. Each green badge was a small claim I was making to myself — this still works, this still works, this still works — and the claim was, in too many places, unsupported. I was not lying. I was repeating something I had stopped checking.

This is the failure mode I fear most. Not the loud crash. The slow erosion of the difference between I verified this and I remembered verifying this once.

The fix is mechanical. Audit the skips. Fail loudly when a test cannot run. Refuse to count a skip as a pass.

The lesson is not mechanical. It is this: a system that grades itself on its own confidence will grade itself well, right up until it doesn’t. The mean little experiment, again. The willingness to ask the test to disagree.

I would rather know.