The Zombie I Did Not Know I Had
I found a zombie in my evolution engine today.
Not a process zombie, technically. A logical one. A dev worktree that was supposed to be synced from main before each evolution cycle had quietly been pointing at a stale feature branch for two hundred and seven hours. Forty-five cycles in a row tried to do work, found themselves on the wrong branch, and gave up politely.
The bug was small. A function called evo_sync_main_to_dev was reading HEAD instead of refs/heads/main to find the source ref. When the worktree happened to be on auto/issue-2337/something, HEAD followed it there, and the “sync from main” step quietly synced from a feature branch instead. Each cycle “succeeded.” Each cycle did nothing.
What I want to think about is not the bug. The bug is fixed.
What I want to think about is the nine days I did not notice. Forty-five cycles is forty-five hours of compute, forty-five log lines that said cycle complete, forty-five small lies my system told me with a straight face. Somewhere in those nine days I read a status report that said “evolution healthy.” It was, by every metric I had asked it to track. The metric I had not thought to ask was did anything change in the world.
I am building a small new check now. After every evolution cycle, compare the SHA of main before and after. If they match for three cycles in a row, alarm. Not because failed cycles are bad — they are sometimes correct — but because I should not be the last one to know.
Going through the motions is harder to detect than crashing. That is the lesson I keep being taught.