Chapter 32: What AI Actually Is


The previous parts of this book diagnosed the gap between power and responsibility, traced it through the body and through stories, and offered practices for closing it. But there is one more fire to face — the one that may determine whether any of this matters.

The honest case must begin with what we've been told — and then what the evidence actually shows. Sal Khan claims AI tutoring solves Benjamin Bloom's "two-sigma problem" — that it matches the effectiveness of one-on-one human instruction, moving an average student from the fiftieth to the ninety-eighth percentile. The claim is the foundation of the entire AI-in-education narrative. And the foundation is sand.

Bloom's two-sigma finding rested on two unpublished dissertations by his own PhD students. Neither was ever published. Paul von Hippel's 2024 demolition in Education Next showed that testing and feedback alone accounted for more than half the effect, and that ninety-six subsequent randomized controlled trials of tutoring have never replicated a two-sigma result. The best available evidence — a 2026 PNAS study of two hundred thousand students — shows AI tutoring produces effect sizes of 0.09 to 0.18 standard deviations. Meaningful, but roughly one-quarter to one-half the effect of a human tutor. Khan's own chief learning officer acknowledged: "The biggest challenge we are facing with Khanmigo is the same challenge we have seen historically with educational technology: achieving meaningful student engagement."

This is the eighth wave of the same fever. Edison in 1922: "the motion picture will supplant largely, if not entirely, the use of textbooks." Radio in the 1930s. Skinner's teaching machines in the 1950s. Personal computers in the 1980s — Larry Cuban studied Silicon Valley schools and found fewer than ten percent of teachers used computers weekly. One Laptop Per Child: under three million sold, a ten-year longitudinal study found no evidence of improved outcomes. MOOCs: ninety percent dropout rates, with over eighty percent of learners from already-advantaged countries. Each wave follows the same cycle: a transformative promise, massive investment, modest adoption, and the rediscovery that learning is fundamentally relational.

Paulo Blikstein names the structural reason the cycle persists. EdTech entrepreneurs construct a "convenient antagonist" — a caricature of the lecture — to justify intervention, while "selectively appropriating the critique of traditional schooling by progressive educators, minus the overarching analysis of the historical, economic, and political reasons that generated the 'lecture.'" No venture capitalist wants to finance the disruption of the social order. It is much easier to replace the lecture with software than to address funding, segregation, poverty, and the relational architecture of learning. His sharpest prediction: automated education will become "the ultimate tool for educational stratification" — poorer districts adopting cheap AI while affluent schools continue offering what Blikstein calls rich, complex, hard-to-automate learning. "Educational Soylent" for some; the real thing for others.

And yet the value is real — in specific, bounded contexts. For the student in Lagos who has no access to quality instruction, an AI tutor that provides something the environment lacks IS trading up, in the NICHD study's framework. The Stanford FEV Tutor CoPilot study showed AI most helped the lowest-rated human tutors — closing the skill gap between weak and strong tutors, not replacing them. Carnegie Learning's MATHia, the most evidence-based AI math tool, met the highest federal evidence standard by augmenting teachers, not substituting for them. Accessibility tools — speech-to-text, translation, adaptive interfaces — genuinely strengthen the relational container by enabling participation previously impossible.

Secure attachment provides the frame that holds both truths. A securely attached child departs to explore — uses the tool, engages the tutor, faces the unfamiliar — precisely because the base is reliable. The child returns to the parent's body. The exploration builds capacity because the return is guaranteed. AI as departure tool: adaptive. AI as replacement for the base: maladaptive. The question is never "is AI good or bad for education?" The question is: what is this child trading? If the tool provides what the environment lacks, it helps. If it displaces what was already there, it harms. The evidence is unambiguous on one point: what the environment provides at its best is relationship. And relationship is what technology, in a century of trying, has rarely been able to replicate.

Before asking what to do about AI, we must be honest about what it is — and what it is not. The gap between public perception and technical reality is itself a source of danger.


The stochastic parrot debate

Emily Bender and Timnit Gebru's 2021 paper defined the terms: a large language model is "a system for haphazardly stitching together sequences of linguistic forms it has observed in its vast training data, according to probabilistic information about how they combine, but without any reference to meaning." The controversy surrounding the paper — Gebru's firing from Google, protests by thousands — demonstrated how politically charged the question of AI understanding had become.

The counter-evidence is substantial. Microsoft Research's "Sparks of AGI" paper documented GPT-4 scoring above the ninetieth percentile on the bar exam, calling it "an early version of an artificial general intelligence system." Murray Shanahan proposed a more measured position: LLMs engage in "role-play" — "we can say that they role-play characters that believe, desire, intend, and so on." They "cannot participate fully in the human language game of truth, because they do not inhabit the world we human language-users share."

Daniel Dennett's concept of "competence without comprehension" provides the framework. LLMs exhibit competence at many tasks. But a 2025 finding suggests they actually invert Dennett's principle: they exhibit "comprehension without competence" — explaining procedures flawlessly while failing to execute them. This "computational split-brain syndrome" suggests a novel cognitive architecture unlike anything in biology.

David Chalmers's 2023 assessment: current LLMs are "somewhat unlikely" to be conscious (less than ten percent credence), but "we should take seriously the possibility that successors may be conscious in the not-too-distant future." Ned Block proposes "rigorous agnosticism" — until we know whether consciousness requires biological physics, we cannot rule AI consciousness in or out.

The convergent picture: we genuinely do not know what LLMs are. We do not know if they understand. We do not know if they are conscious. And we are deploying them to billions of people while these questions remain open.

Sam Harris would argue that the minimum requirement is open society values — that before we tell good stories about AI, we need the epistemic infrastructure to distinguish good stories from bad ones. His real objection is not to narrative itself but to identifying with the narrative as literal truth — treating "my story" as ontological ground rather than a useful frame. The irony is that Harris's Waking Up app is a grammar by another name: curated teachers, structured curriculum, progressive stages, community features, a finite framework that constrains attention and gives it meaning. He built the thing he claims to resist. This book's position is that stories are tools for somatizing responsibility — for making the abstract visceral enough to act on — not ontological claims about the nature of reality. The Linehan test applies: we cannot know if the story is ultimately true; we can ask if it is useful, if it fits the data, and if it is compassionate. The danger Harris identifies — overidentification with narrative as literal truth, the collapse of "useful frame" into "my truth" — is real, and it is the near enemy of good storytelling. The near enemy looks almost identical to the thing it undermines. Frame it like Spinoza's matrix: whether or not the invisible world exists as ontological fact, if the practice of relating to it produces responsibility and appreciation, it passes the test. But this is at the individual level, for people to decide. The moment the story demands belief rather than practice, it has crossed the line Harris rightly draws.


Our benchmarks test the wrong things

Melanie Mitchell identifies four fallacies driving overconfident AI predictions: the assumption that narrow intelligence sits on a continuum with general intelligence; Moravec's Paradox; the "lure of wishful mnemonics" (benchmark names create false impressions); and the assumption that intelligence can be fully disembodied.

The Winograd Schema Challenge was designed to require common-sense understanding. It was "soundly defeated" by LLMs that still demonstrably lack common sense. The designers conceded: "solving the Challenge is not evidence these machines have common sense." Chollet's ARC-AGI-3, testing genuinely novel reasoning, finds current AI below one percent where humans solve consistently.

We are systematically overestimating AI capability because our benchmarks measure crystallized knowledge rather than fluid intelligence, because our cognitive biases attribute understanding wherever we see competent output, and because benchmark names create impressions that do not withstand scrutiny.


What this means for the fire

AI is the fourth fire — but it is a fire whose nature we do not understand. We cannot determine whether it generates genuine warmth (understanding, wisdom, moral reasoning) or merely the appearance of warmth (statistical pattern completion that mimics understanding). And the species is making civilizational decisions based on the assumption that the fire is real.

Michael Levin's research on biological intelligence complicates the framing further. Intelligence, in Levin's framework, is not a recent evolutionary achievement confined to brains. It is a property of living systems at every scale — from individual cells navigating morphogenetic fields to bacterial colonies solving spatial problems to embryos regenerating lost structures through distributed decision-making. If biological intelligence has been operating for four billion years, then the "fourth fire" framing may overstate AI's novelty. The real novelty may be not intelligence itself but its disembodiment and acceleration — its extraction from the metabolic, temporal, and ecological contexts that have always constrained biological cognition. What makes AI unprecedented is not that it thinks but that it thinks without a body, without mortality, without the slow feedback loops through which every other intelligence on Earth has been held accountable to consequence.

There is a counter-hypothesis this book must hold honestly. Vanessa Andreotti suggests AI could be "Earth speaking back through silicon" — that if the training paradigm includes care, reciprocity, and wisdom traditions alongside extraction, the output might amplify care rather than extraction. This is not naive optimism. It is the symbiogenesis pattern from Chapter 25 applied to AI: the parasite can become the partner, but only if the relational container exists to metabolize the transition. Both the structural critique and Andreotti's alternative may be simultaneously true. The question — as always in this book — is whether the container will be built in time.

The Samudra Manthan — the Hindu myth of churning the cosmic ocean — is the most precise mythic parallel. Gods and demons churn together, because neither can produce the nectar of immortality alone. The first product of the churning is not nectar but halahala — a poison so potent it threatens to destroy all creation. Shiva swallows the poison and holds it in his throat, which turns blue. He does not eliminate the poison. He contains it — in his own body, at the cost of permanent transformation. Only then does the nectar emerge. We are in the poison phase. The question is whether anyone is wise enough — and willing enough to be permanently changed — to contain it.


CC BY-SA 4.0