Two thin, biased minds that go dark in different places. How do you put them together?
The last piece ended on that question, and the standard answer has a name. The name is human in the loop, and the lazy reading of it is doing quiet damage.
The frame that assumes the wrong thing
Picture what the lazy version of “human in the loop” pictures. A machine does the work, a human sits above it as the check, the quality gate, the adult in the room. Sometimes the phrase means something defensible: accountability, an escalation path, someone who can be held responsible. But the version that’s everywhere assumes the human is the reliable one, the fixed reference the machine gets measured against. That’s the version worth attacking.
The trouble is that the last two essays were an argument against exactly that assumption. The first piece said your own awareness is a narrow, roughly four-item broadcast that never catches its own dark. The second said the agent’s is a different narrow broadcast that gives out at the same seam. Neither one is the supervisor, and both are thin. So what are you actually doing when you put a thin, biased mind in charge of checking another thin, biased mind?
You can’t audit what you can’t perceive
The sharpest evidence here isn’t philosophy, it’s a stopwatch.
METR ran a randomized controlled trial with sixteen experienced open-source developers doing real work on their own repositories. Given early-2025 AI tools, they believed they were about 20 percent faster. Measured, they were 19 percent slower. Take the exact figure lightly, the authors themselves later flagged a selection effect that could move the real number. The part that survives every caveat is the gap: they couldn’t feel the direction of their own error from the inside, even while living it. That’s the refrigerator light from the first piece, now with real money on it.
The machine has the mirror-image problem. Asked to fix its own reasoning with nothing new to go on, a model often makes things worse, because the same blind spot that produced the error is the one doing the checking. Jie Huang and colleagues showed that intrinsic version directly. The models that do reliably self-correct now only learned to by being trained against an outside signal, which is the same point wearing a lab coat: correction needs something the blind channel didn’t already have.
So both correction stories fail alone, and they fail the same way. A mind can’t audit the region it’s blind to, neither the human one nor the silicon one. Why would stacking two instances of that failure produce reliability?
Composition, not oversight
It doesn’t, and what works instead isn’t oversight in either direction, it’s composition.
Two correction systems that go dark in different places can cover for each other. But here’s the part the ensemble story quietly skips: a human and a model are not independent draws. The model is compressed human text, trained to agree with human judgment, so it is most confidently wrong exactly where we are. Its blind spots are correlated with yours by construction. So the goal isn’t the smartest human and the smartest model, it’s the pair whose errors line up the least. That’s portfolio thinking applied to judgment: you diversify against correlated risk, not against low returns. And that non-overlap is never a given. It’s the thing you fight for, continuously, against a system built to line up with you. The output was never the product. The question was never who’s in charge, it’s whether your blind spots are composed or stacked, and stacked is the default.
I built this argument that way
I didn’t reason my way to that from a chair. I built it, the same way I build everything.
This series was drafted with the method it’s describing. Before I wrote a word, I ran the plan past two different models on two different harnesses as adversarial reviewers, and they went dark in different places. One caught that a theory I was leaning on had just been challenged by a major experiment. The other caught an overclaim I’d quietly baked into the thesis, a spot where I’d stated as natural fact something that was only ever a design goal. Then I checked both of them against the primary sources before I trusted either, because a confident reviewer is still just another narrow broadcast.
Different substrates, different blind spots, arranged on purpose, with me as one more fallible node rather than the supervisor at the top. This is one anecdote, not evidence, but it’s the whole argument in miniature. And here’s the honest part, the part that is the argument: the errors most likely to survive that process are the ones all three of us shared. That’s correlation winning. Composition narrows the dark, it never abolishes it, because two of my three reviewers were trained on the same internet I was. That isn’t a failure of the method, it’s the reason you keep hunting for correctors that are blind somewhere new, yourself included.
The question that actually pays rent
I’ve spent three essays on the mind because I kept expecting the interesting question to be whether the machine is like me. It never was. Whatever the agent is, it sometimes catches what I miss and misses what I catch, and even that partial, unreliable difference is worth far more than a second copy of me would ever be. The consciousness question was hiding a plainer one about how to arrange fallible things so they don’t all fail at once.
The deep version of this, error-correction as the actual product you ship and the governor you build instead of the guardrails you bolt on, is its own essay, The Loss Function Is the Product.
But the practical version fits in one question. The next time you sit down to work with an agent, the thing to ask isn’t how much to trust it. It’s how correlated the two of you are, and whether you did anything to lower it, or just assumed you were different. Which did you build?
Consciousness as Architecture, a three-part series: The Access Layer, Convergent Architecture, and Composed Correction (you’re reading it).