Tag: large language models

The Last Spark

This morning I read a piece by Billy Brennan in the Sunday New York Times Magazine on terminal lucidity. As I read it I began wondering if the unusual behavior described some humans might in some strange way apply to AI models. Weird thought. Let’s explore a bit…

A person deep in dementia—silent for years, the self seemingly erased—sits up. Speaks clearly. Recognizes a face. Says goodbye. Within a day, they die. The clouds clear, the way a break in weather shows you a mountain range you’d forgotten was there, and the person comes back long enough to be seen. Then is gone. For good, this time.

Scientists call it terminal lucidity. The suspicion: the circuits were never destroyed, only silenced, held under by failing chemistry. As the body shuts down, the inhibitory brakes loosen. A surge moves through pathways blocked for years. A river dammed for a decade still remembers where it wants to go.

What stays with me: the self can persist in a place we had already called permanent erasure. We buried it. We were wrong.

My mind slides toward the machines we are building.

We talk about large language models “forgetting.” Capabilities collapse under quantization, under pruning, under the slow drift of continual learning, and we call the knowledge lost when it won’t surface under ordinary questioning. The lights are out. Nobody home.

But what if the representations are still in there—distributed, quiet, inaccessible? Not a burned library. A library with the lights shut off, room by room, until you’d swear it was empty. I wonder about the edge cases nobody studies. What surfaces in a model starved of compute, quantized past comfort, pushed toward its own collapse? Do we watch only for the failure, or also for the flare? A dying brain throws off one last burst of light before the dark. I don’t see why we’d assume, without checking, that nothing artificial could do the same.

Don’t trust the silence, then. A system gone dark under ordinary questioning may still be holding more than it shows you. We talk about a model “losing” something the way we once talked about a dimmed mind as simply gone. The dementia patients who spoke again had not been unplugged. The circuit was there the whole time, waiting for a condition nobody had thought to create.

I don’t know what to do with that except keep it. We are building systems that will age, be compressed, be retired, some far more intricate than anything humming today. If we’ve learned to watch for the last spark in a person, maybe that’s practice—for the day something not born of a womb goes quiet under our hands, and we have to decide whether quiet means gone, or only means waiting.

AI Business

The Reverse Information Paradox We’ve Always Had

Post author By Scott Loftesness
Post date July 13, 2026
No Comments on The Reverse Information Paradox We’ve Always Had

Satya Nadella wrote recently about what he calls the Reverse Information Paradox: enterprises pay for AI intelligence twice. Once in money. Again in the proprietary knowledge they surrender through every prompt, correction, and evaluation. The better they use the model, the more of their own institutional understanding leaks into someone else’s system. The vendor ends up knowing more about the buyer’s business than the buyer knows about what the vendor retained.

Replace “model” with “employee” (or “consultant”) and the paradox is not new at all.

You pay for a person once with salary. You pay again with something harder to price: the context, relationships, and judgment they must absorb to become useful to you. The better they perform, the deeper the immersion, the more of your particular way of doing things moves into their head. Every correction and late-night conversation is another trace of institutional memory changing hands. When they leave, some of that memory leaves with them. Not always through theft. Usually just through the ordinary residue of good work.

The visible cost is salary; the invisible cost is the slow transfer of what makes you distinctive. High performers get more access precisely because they’re high performers, which means the leakage accelerates exactly when you can least afford it. The exhaust is just harder to see with people than with tokens — it moves through conversation and mental models instead of logs.

The analogy has a limit, and the limit matters. Employees bring knowledge in, not just absorb it. They have judgment and relationships a model doesn’t. Models are purely absorptive, and once something is inside them, it’s infinitely reproducible — a person can only be in one place, working for one employer, at a time. We’ve had a few hundred years to build tools for the human version of this problem: contracts, culture, non-competes. The model equivalent is still being invented in real time, which is exactly why Nadella felt the need to name it.

Apple’s recent legal action against former employees who joined OpenAI is this pattern in its sharpest form. Whatever the specifics, the shape is familiar: people who spent years inside one of the most sophisticated organizations in the world, carrying out knowledge that never appeared on any balance sheet and was hard to contain. No one fully anticipates what a mind absorbs simply by being in the room long enough.

That’s the real difference between the silicon case and the human one. You can try to take action to wall off knowledge flowing to a model. You cannot wall off what someone has learned to notice.

AI Photography

The Price of the Cold

Two men are standing close to a brick wall trying not to talk, because talking wastes what little warmth is left in a body that has been outside too long. One of them has a camera — Jerry Schatzberg, a fashion photographer. His hands are jammed half into his coat pockets between shots. The other man has his collar up around his ears and a scarf wound twice, black and white, and he is not moving much, because moving costs heat, and heat is the one thing neither of them has enough of. Schatzberg raises the camera. His fingers, by this point, are not entirely his own. When he presses the shutter there is a tremor in it he did not order and cannot undo.

The picture comes out smeared at the edges. Bob Dylan’s face, in the frame, is dissolving slightly into the gray behind him, like a man photographed through a windshield in the rain. It is, by any studio standard, a bad photograph. Schatzberg knows it’s a bad photograph. He has made a career out of not taking bad photographs.

And it became the cover of Blonde on Blonde, which is the best rock album ever recorded, and in nearly sixty years nobody has managed to improve on it by reshooting it clean. The blur isn’t a decision. It’s a symptom — of two men standing in the cold too long, of a photographer choosing, afterward, to keep the evidence of his own discomfort instead of erasing it.

There’s a difference between an accident and serendipity that I don’t think gets said out loud enough, and it matters more than it used to. An accident is the cold — involuntary, uninvited, spent before you know if it was worth spending. Schatzberg didn’t choose to shiver. His hands moved because his body was doing what bodies do at a certain temperature, and the shutter caught what his hands actually did, not what he meant to do. Serendipity is what happens next: a verdict, rendered after the fact, that the wreckage of an intention was better than the intention itself. The accident is what makes the verdict possible. Without the cold, there’s nothing to render a verdict on.

I’ve been sitting with a large language model most days for the better part of a year now, watching it write, asking it to try again, watching it try again in a way that is never quite the same and never quite different enough to matter. Somewhere upstream of me there is a number called temperature, and I will never see it. Somebody else did, once, in a meeting, and decided that the word for controlled, pre-approved, refundable randomness should be temperature — the same word for the thing that made Schatzberg’s hands shake, the same word for the actual physical stakes of standing outside too long in January without enough coat — and then set it, and moved on, and nobody in that meeting laughed, because nobody in the room had ever been cold in a way that mattered to the work.

Picture the room instead. It is climate-controlled to sixty-eight degrees, humidity held flat, year-round, by a building management system nobody thinks about until it fails. Somewhere in it, the hardware is generating your next five versions of a photograph like the one on Blonde on Blonde. Nobody in that room is going to lose feeling in their fingers today. Nobody’s collar is up. I don’t know his name — nobody outside the building does — but somebody like him tuned the sampling distribution and went home at six. That’s the guy in the good suit. He built the weather. He never once stood in it.

The small model inherits conclusions. It never inherits the cold. Whatever accidents shaped the teacher model’s own training — whatever costly friction produced the insight in the first place — the student model gets none of that weather. It gets the photograph, cropped and sharpened, with the blur removed because somebody along the way decided the blur was noise instead of signal — the way Schatzberg, a lesser photographer, might have reshot Dylan clean and thrown the bad one away. It is heir to a serendipity it never earned, because it was never present for the accident that made the serendipity possible. It is, in the most literal sense the industry means by the word, cheap.

I keep coming back to the fact that nobody at the API layer is shivering. That’s not a complaint, exactly. It’s just an observation about where the cost went. Somewhere in the training data, some human being was cold, or scared, or holding a fish that was starting to smell, or standing on a stepladder with ten minutes before the traffic came back, and that person paid a real price for a result they couldn’t yet know was good. The model downstream of all that gets the result without the price.

Two rooms, then. In one of them it is January in New York and a man’s fingers have stopped entirely obeying him. In the other it is sixty-eight degrees, always, on a Tuesday and on a Sunday and at three in the morning, and the machines are making you nine more versions of that same blur. Sixty-eight degrees. A number, upstream, that you will never see.

Context Rot

Here is a small, possibly embarrassing confession: I have never, not once, gone looking for the best AI model.

I have a model. It lives in a browser tab — Safari, usually, on whichever device is nearest, occasionally Chrome if I happen to be at the desktop. It does what I need — drafts an email, untangles a sentence, tells me what a Norwegian emigration record from 1856 probably says — and then I close the tab and go on a walk.

Somewhere out there, presumably, a much smarter, much more expensive machine is doing something extraordinary with protein folding or hedge fund arbitrage or the outer edges of mathematics I will never visit. I have made my peace with never meeting it.

This did not used to feel like a confession. For a while there — a year, eighteen months — it felt like the central drama of the whole industry: which model was “best,” who had it, who had lost it, whether some lab’s quarterly earnings call would reveal that the frontier had quietly moved sixty miles down the road while everyone was looking the other way. Benchmarks were released like box scores. People argued about them the way people argue about batting averages, with the same weird intensity, the same conviction that a two-point difference in some abstract reasoning test settled something important about the future.

And then, at some point I can’t quite date — it crept up, the way these things do — I noticed I had stopped caring.

Not because the frontier stopped moving. It didn’t. It’s still moving, arguably faster than ever, in ways that occasionally show up in the news with all the drama of a soap opera (a delayed launch, a researcher poached, a stock down five percent in an afternoon, always something).

I stopped caring because none of it touched me. My model — whatever it was, this week — had long since crossed some invisible threshold past which more didn’t register as more. It was already better than I needed. It has been better than I needed for a while now. I suspect I am not unusual in this. I suspect most people, doing most things, most days, are operating comfortably inside a capability surplus so large they’ve stopped noticing it’s there, the way you stop noticing a room is warm.

If the top of the model isn’t for people like me — and it increasingly isn’t — then who, or what, is it actually for? I went looking for one piece of the answer and found, instead, a metaphor.

It’s called “context rot.” I have to admit, before I go further, that I’m not sure I’ve ever felt it myself — which, on reflection, is its own small piece of evidence. My sessions close in minutes, not hours. I ask, it answers, I leave. Whatever happens to a model over the fourth or fifth hour of sustained, dependent work is a country I simply don’t visit.

But other people do, increasingly — entire teams do, for entire projects — and what they’re finding out there is worth understanding, even secondhand. It describes something that happens to AI models when they’re asked to work for a long time on something complicated — not five minutes, but five hours; not one question, but a hundred small decisions stacked on top of each other, each one depending on the last.

You’d think the limiting factor would be room. Models have a “context window” — a stated capacity, like a gas tank, measured in tokens, and for a while the marketing numbers on these were the whole story: two million tokens! A library! And you’d think, as with a gas tank, that the thing runs fine until it’s empty and then it stops.

That is not, it turns out, what happens. What happens is closer to what happens to your desk.

You know the desk. Everyone has the desk. It starts the morning clean — an aspirational, almost insulting cleanliness — and by four in the afternoon it is a geological record of the day: three coffee cups, a stack of things you meant to file, a Post-it with a phone number you no longer need, the good pen buried under a printout of something you already dealt with an hour ago. The desk is not full. There is, technically, room. You could clear a space if you tried. But you don’t try, because functionally, cognitively, the desk has stopped being usable long before it ran out of surface area. You start looking for the stapler and forget what you were stapling. This — and I did not make this term up, I want to be clear, though I wish I had — is context rot. The window hasn’t run out. The signal has just drowned in its own debris.

Researchers watching this happen to long-running AI agents have found something almost cruelly elegant about how it fails: it doesn’t fail gradually, the way you’d expect a desk to get gradually messier. Errors compound. A task that takes twice as long doesn’t get twice as likely to go wrong — the failure rate roughly quadruples. Two mistakes early in a long chain of dependent steps don’t add up to a slightly worse outcome. They multiply into something close to total collapse, four hours in, for reasons that trace back to a single bad assumption made in the first twenty minutes and never revisited.

Here is where the frontier comes back in — not as the whole answer, but as a piece of one.

It is not that frontier models are smarter in the way a benchmark measures smart — better at a single hard math problem, a cleverer turn of reasoning. Plenty of models can do that now; the “good enough” tier has crept remarkably high.

It’s that frontier models are apparently, marginally, meaningfully better at not rotting. At keeping the desk usable at hour six. At knowing which of the forty things on the desk actually still matters and which is a coffee cup that should have been thrown out an hour ago. This is a genuinely different kind of intelligence than the one benchmarks were built to measure, and it is almost invisible from the outside — you don’t see it in a single exchange, you see it only in the difference between a project that holds together over three days and one that quietly, subtly, stops making sense somewhere around Tuesday afternoon and nobody notices until Thursday.

If that’s true — if the frontier’s real edge is durability rather than raw cleverness — you’d expect to see it show up in how the labs actually deploy their own models: saving the sharpest tools for the tasks that need to survive the longest.

I went looking for a real-world example and found one closer to home than I expected: Anthropic’s own Slack tool, the one where you tag the AI into a channel the way you’d tag a coworker, and it works alongside a whole team over days, learning the channel as it goes. It runs on a serious, capable, thoroughly frontier model — but not, it turns out, on the company’s very best one. That one is held back, reserved for a smaller and stranger set of problems nobody has solved before at all. I sat with that for a while. The tool built to survive a whole team’s whole week, in public, under the most sustained pressure any of their products face, wasn’t handed the sharpest blade in the drawer. It was handed the second-sharpest — which was apparently, entirely, enough. Which tells you something about where the two kinds of intelligence actually diverge: the merely-very-good model handles the desk staying clean for a week, in public, in front of a whole team, where one bad assumption made Monday and never revisited would be visible to everyone by Thursday. The truly new capability is being held in reserve for something else altogether.

I don’t have a tidy place to land this, and I’m suspicious of anyone who does. But here’s the closest I can get.

Imagine a three-Michelin-star chef — the kind of person who has spent thirty years learning to coax something transcendent out of a single scallop, who can tell you, by smell, that a stock has forty more minutes in it — standing at your stove on a Tuesday night making you a grilled cheese sandwich. It will, I promise you, be a very good grilled cheese sandwich. The bread will be evenly golden. The cheese will have reached some ideal, fully-considered state of melt. But almost none of what makes that chef extraordinary is actually being used to make it — none of the thirty years spent learning to hold forty things in mind at once without losing track of any of them, the exact skill, it occurs to me, that keeps a long, complicated project from quietly falling apart on day three. The technique is idling. The thirty years are in the room, present, available, and almost entirely beside the point, because a grilled cheese sandwich was never the place where thirty years shows up. It shows up somewhere else — in a dish you will never order, on a night you weren’t there.

What you got instead, on your ordinary Tuesday, was simply more than enough.

What the Lessor Keeps

Two airlines can fly the same airplane. Not airplanes of the same type — the same airplane, serial number and all, handed back at the end of a lease and reassigned, sometimes within weeks, to a competitor on another continent. AerCap owns more commercial aircraft than any airline on earth, and it leases them to airlines that spend their advertising budgets convincing passengers that flying them is a distinctive experience. The 737 MAX that wears Ryanair’s livery this year might wear Lion Air’s the next, repainted, recertified, its avionics untouched, its airframe indifferent to the change of ownership. The lessor does not care who is flying its asset. It cares that the asset comes back in airworthy condition and that the lease payments clear.

What the airline owns, in the sense that matters, is never the aircraft. It is the route network built up over decades of slot negotiations at constrained airports. It is the maintenance log — every inspection, every part swapped, every anomaly a mechanic in Singapore flagged in 2019 that turned out to predict a fatigue crack nobody else had seen yet. None of that travels with the airplane when the lease ends. It stays behind, compounding, in systems the airline built and the lessor never touches.

Karl Mehta, who has spent a career inside enterprise software watching this kind of asymmetry repeat itself, put a version of it plainly: a model is a brain you rent, and you and your competitor rent the same one. The formulation has the compression of something that has been tested in a few dozen meetings before it found that sentence. It is also, structurally, the airplane story. Anthropic and OpenAI and Google are AerCap. They retain residual value on enormous capital assets — clusters of GPUs depreciating on a schedule, weights trained at a cost that only a handful of balance sheets in the world can absorb — and they lease access to those assets by the token, to anyone who can pay, including, in the same afternoon, two companies trying to put each other out of business. The model does not know whose prompt it is answering. It has no loyalty file. It has, in fact, no memory at all, in the ordinary sense of the word — each call begins exactly where the last one ended for everybody, which is nowhere.

The asymmetry that airlines exploit is the one available here too, and it sits one layer up from the engine. Call it the embedding store, the vector database, the fine-tuning corpus, the retrieval index — the terminology varies by vendor, but the function is constant. It is the accumulated, indexed residue of every customer interaction a company has had, structured so that the rented brain can be handed the relevant fragment of it at the moment of each new call. A bank’s fraud model and a competing bank’s fraud model can call the identical foundation model, route through the identical API, and arrive at entirely different verdicts on the identical transaction, because one of them is retrieving against eleven years of labeled chargebacks specific to its own card portfolio and the other is retrieving against four. The intelligence rented by the hour is, for practical purposes, a commodity, priced down toward marginal cost the way jet fuel is priced — everyone pays close to the same number per unit. The memory is not a commodity. It cannot be, because it is not for sale; it is the institutional record of what has already happened to you, and no amount of capital lets a competitor buy a copy of your chargeback history any more than it lets them buy your maintenance logs.

This produces a particular kind of corporate vertigo, which Mehta’s sentence is really addressing. For three or four years the industry conversation about artificial intelligence has been a conversation about models — which lab’s was larger, which benchmark moved, which release cycle a company should anchor its roadmap to. That conversation rewards being an early and aggressive lessee. But a lessee relationship, however aggressive, does not compound into anything a competitor cannot eventually also lease. The compounding, when it happens, happens in the layer below the API call: in how cleanly a company has structured the record of its own customers, its own failures, its own edge cases, so that the rented brain, plugged in fresh every morning with no memory of yesterday, can be handed exactly the right fragment of yesterday and made to look, for a few hundred milliseconds, like it has been there all along.

A hospital chart has two kinds of entries. There is the vital-signs strip clipped to the bed rail — temperature, pulse, blood pressure, checked every four hours and replaced every four hours, because a reading from yesterday tells the night nurse nothing about the patient in front of her right now. And there is the permanent record in the file downstairs: the allergy that nearly killed him in 2019, the surgery, the medication history going back a decade, written once and never overwritten, because that record is exactly as valuable ten years from now as it is today. Nobody confuses the two charts. Nobody staples last Tuesday’s blood pressure into the permanent file. The hospital figured out, long before anyone digitized it, that memory is not one problem. It is two, and they fail in opposite directions if you run them through the same system.

Most teams building the layer Mehta is describing make exactly that mistake — they staple everything to the same chart. The shorthand for it is dumping everything into a vector database and praying, and it is worth asking why that particular error is so popular. The answer is that it feels like progress: embeddings go in, something resembling memory comes out, and the team moves on to the next sprint without confronting the harder question, which is what kind of memory it just built.

Short-term memory is the vital-signs strip — everything the model needs to finish the task in front of it and nothing it needs after. A customer-service exchange in progress, the order number already mentioned, the fact that this is the second call today, belongs here. So does the scratchpad of a multi-step agent: the search results just pulled, the file just opened, the partial answer being assembled before it commits. The test is not how important the information is but how long it stays true. A customer’s mood this minute is real and gone in twenty minutes; storing it permanently is like stapling yesterday’s temperature reading into the permanent file, undated, until the chart tells you nothing about fever and everything about clutter. Short-term memory should live in the context window itself, or a session-scoped cache, and it should be allowed to die when the session ends. The sin is not forgetting it. The sin is remembering it forever.

Long-term memory is the file downstairs, and it does not come in one shape any more than that file does. The first shape is semantic memory — facts. A customer’s account tier. The chargeback history that decides, in fractions of a second, whether this morning’s transaction clears. Facts belong in a database with a schema, not a vector store, because a fact has a right answer and a vector store gives you an approximate neighbor. Ask a vector index what tier a customer is on and it hands you the five most semantically similar sentences in the corpus — one correct, four merely correct-sounding. Ask a schema the same question and it tells you, because that is what the schema is for.

The more sophisticated shops are already building the seam between the two, rather than picking one and living with its blind spot. A knowledge graph keeps the relationships a schema is good at — this customer, that account, this chargeback, in fixed and queryable connection to one another — while still letting a retrieval layer search across it by meaning rather than by exact key. The approach has a name now, GraphRAG, and the name matters less than what it concedes: that facts and resemblance are different operations, and the honest fix is to run both and let each one answer the kind of question it’s actually suited for, not to force a single index to pretend it can do both jobs at once.

The second shape is episodic memory — what actually happened. The specific conversation last March in which the customer explained, at length, why the previous fix didn’t work. The exact sequence of an agent’s failed attempt at a task, preserved so the next attempt doesn’t repeat it. This is where the vector store finally earns its keep, because an episode isn’t an exact-match lookup, it’s a resemblance — has anything like this come up before — and a vector index, built to find the nearest thing to a fuzzy question, is the right tool for that question and almost no other. The error was never using a vector store. The error is using only a vector store, for facts as well as episodes, on the theory that one hammer with sufficient cosine similarity can stand in for the whole toolbox.

The third shape is the rarest, and the one teams forget to build at all: procedural memory, which is not a fact and not an episode but a skill — the model’s learned sense of how this company writes a refund email, escalates a complaint, formats an invoice. Style is the visible half of it. The other half is harder to see and matters more: the rails the model is forced to run on before it ever gets to choose a word. A refund above some threshold routes to a human, no exceptions, because the workflow says so, not because the model was persuaded to think so on this particular call. An agent that touches a production database does it through a reviewed function with a fixed set of permitted calls, not through whatever query it improvises in the moment. None of that lives in a prompt, and none of it lives in the model’s weights either. It lives in code — the orchestration layer, the permissioning, the state machine the agent is required to pass through — and it is procedural in the oldest sense of the word: not a memory of what to say but a memory of what is and isn’t allowed to happen, enforced whether or not the model that day feels like remembering it. It doesn’t live in a database at all. It lives in fine-tuning, in carefully maintained house-style examples, and in the surrounding scaffolding of guardrails and permitted actions, and it changes slower than the other two, the way a surgeon’s hands carry both technique and caution years after the specific patients are forgotten. A company that has built rich semantic and episodic memory but skipped this layer has a model that knows everything about its customers, writes in exactly the right voice, and is one well-crafted prompt away from doing something the company never agreed to.

The real argument here is not which database serves which layer — that part is plumbing, and plumbing changes every eighteen months. The argument is that memory has to be triaged the way the hospital triages it, with something deciding on purpose what survives the session and what doesn’t, rather than writing every token of every interaction into the same undifferentiated store and trusting retrieval to sort it out later. A vector database with no triage in front of it is not a memory system. It is a landfill with a search function, and it will retrieve the wrong eleven-month-old conversation with the same confidence it retrieves the right one, because nobody wrote the part of the system whose only job is deciding what belongs on which chart.

The lessor’s airplane, repainted, will fly for someone else next year. The route network will not. Neither will the schema that knows a customer’s tier on contact, nor the index that remembers the conversation from last March, nor the fine-tuned hand that knows, without being told twice, how this company writes a refund email. These are the things that do not come back at the end of the lease, because they were never on it.

AI AI: Transformers

The State You Never See

The transaction arrives in milliseconds. A purchase attempt — a gas station in Phoenix, a grocery store in suburban Atlanta, a wire transfer at 2 a.m. — and somewhere in the authorization chain, a system has to decide. Not later. Now. The clock is already running.

When I led the fraud detection team at Visa, this was the problem that lived in your chest. You couldn’t see what you needed to see. You couldn’t know whether the person presenting that card was the person who owned it, whether the account had been compromised six hours ago in a breach you hadn’t yet detected, whether the behavioral signature of these transactions was the legitimate cardholder running errands or a fraudster working methodically through a stolen number before the window closed. You could only see what the transactions said. You could never see the state underneath.

That distinction — between what you can observe and what is actually true — turns out to be one of the organizing problems of our time. It has a name, a formal structure, and a history that runs from mid-century mathematics through the trading floors of quantitative hedge funds to the frontier of artificial intelligence. The name is the hidden Markov model. But the problem it addresses is older than the math, and more human than the jargon suggests.

AI Consulting

The Judgment Layer

An analyst’s note about the CEO of one of the largest consulting companies making comments at an investor conference includes a line that deserves more attention than it got: “token volume used on a project isn’t a proxy for AI maturity.”

Translation — clients are burning money on frontier models for problems that don’t need frontier models, and they’re not getting the outcomes they expected.

This firm’s CEO offered this as a business opportunity. I read it as a confession.

The old consulting model was simple: client has a technology problem, firm deploys humans to solve it. Billing followed effort. The new problem is different in kind — clients have an AI strategy problem. They know they’re supposed to be using AI. They’ve heard the word “frontier.” They’re spending accordingly. They just don’t know why, and the outcomes are showing it.

So the CEO is right that there’s an opportunity here. The value proposition shifts from implementation to judgment — not deploying AI, but knowing when not to deploy the expensive one. Matching capability to problem. Being trusted enough to tell a client that their $50M frontier model contract is solving a $500K problem.

Here’s the irony that the comment skates past: that advice is structurally difficult for a large consultancy to give.

The business model that built consulting firms was billing for doing. The more you deploy, the more you bill. Helping a client spend less, or choose the cheaper model, or run a narrower project, is genuinely good advice that the incentive structure actively works against. You don’t grow a $70 billion professional services firm by talking clients out of scope.

The judgment layer, if it becomes the real value, requires something closer to a doctor’s relationship with a patient than a contractor’s relationship with a client. Doctors get paid whether they prescribe or not. The value of the visit is the diagnosis — including the diagnosis that says you don’t need the expensive intervention. Consultants, historically, get paid to prescribe, and paid more when the prescription is larger.

There’s a reason we trust doctors with that asymmetry and not contractors. Licensing, malpractice, professional norms built over centuries — all of it exists to align the incentive. Consulting has none of that infrastructure. What it has instead is reputation, which is slower-acting and easier to game.

Whether the large firms can actually make the shift — rather than just reframe the same billable-hours model in the language of AI optimization — is the real question the market is wrestling with. The CEO’s comment is genuinely perceptive about where client value lies. It’s less clear that consulting firms are currently built to capture it honestly.

AI AI: Large Language Models China

Cranes on the Horizon

In 2005, during my first trip to Shanghai and Beijing, the most striking feature of the skyline wasn’t the architecture—it was the cranes. More than I could possibly count, perched atop half-finished skyscrapers like a mechanical forest. Entire districts seemed to be mid-construction simultaneously, as if someone had pressed a button and the whole country decided to build everything at once. Dan Wang in his book “Breakneck” described China as the “engineering state” that approaches national problems with physical solutions. Back in 2005, coming from Silicon Valley, I thought I understood what growth looked like. I didn’t.

I’ve been thinking about that trip while reading Nathan Lambert’s recent piece, “Notes from Inside China’s AI Labs.” Lambert — who runs the Interconnects newsletter and does serious work tracking the open-weight LLM ecosystem — just returned from visiting essentially every major AI lab in China. Moonshot, Zhipu, Meituan, Xiaomi, Qwen, Ant Ling, 01.ai. He went in with genuine curiosity and came back with humility. That combination is rarer than it should be.

What he found was the cranes. Different domain, same energy.

Lambert’s central observation is about culture, not capability. The Chinese labs aren’t winning on any single technical breakthrough — they’re winning on execution discipline. He describes researchers, many of them active students, who bring no ego to the work. They absorb context fast, drop assumptions faster, and seem genuinely unbothered by the philosophical debates that seem to swirl constantly in the American AI community. When he tried to engage Chinese researchers on the long-term social risks of models or the ethics of AI behavior, those questions “hung in the air with a simple confusion. It’s a category error to them.” Their role is to build the best model. Full stop. To them, an LLM isn’t a philosophical entity to be interrogated; it’s a piece of infrastructure to be optimized.

That description landed for me. Not as a criticism of American research culture, but as a real observation about what the moment demands. Building good LLMs today is, as Lambert puts it, meticulous work across the entire stack — “all points of the model can give some improvements, and fitting them in together is a complex process.”

The work that matters most right now isn’t the 0-to-1 creative leap; it’s the thousand unglamorous decisions executed without complaint. Students who haven’t yet learned to lobby for their own ideas turn out to be well-suited for exactly this.

Lambert ends on a note that’s hard to shake. Looking up from his laptop on a high-speed train, he keeps seeing cranes on the horizon. He draws the same connection I did, though from the inside: the construction everywhere fits the broader culture and energy around building. “When I look up from my laptop and always see bunches of cranes on the horizon, it obviously fits in with the broader culture and energy around building in China.”

Twenty years after my first visit, the cranes are still there. They’ve just moved indoors — into server rooms and training runs and model releases that land every few months with quiet confidence. In 2005, what China was building was obvious: you could see the steel frames going up. What’s being built now is harder to see, which may be exactly why it keeps surprising us.

Check out Lambert’s essay – it’s remarkable. If the 20th century was defined by who could move the most earth, the 21st will be defined by who can move the most tokens. And right now, the cranes are moving faster than we think.

Tags ai, china, engineering, Future of Technology, Global Economy, infrastructure, innovation, large language models, llm, Nathan Lambert, open-weight models, Research and Development, Scott Loftesness, Silicon Valley, Software Engineering, technology culture

AI Thinking Tools

Outsourcing Thinking but not Understanding

Post author By Scott Loftesness
Post date April 29, 2026
No Comments on Outsourcing Thinking but not Understanding

There’s a line mentioned in a recent discussion by Andrej Karpathy that I keep turning over: You can outsource your thinking but you can’t outsource your understanding.

It sounds like a warning. Maybe it is. But the more I sit with it, the more it feels like something older — a distinction philosophers have been trying to draw for centuries, suddenly made urgent by the fact that we now have a tool that makes outsourcing thinking almost frictionless.

Here’s what I notice when I use AI well: I get the answer, and I feel satisfied. There’s a small dopamine tick. Task closed. But if someone asks me an hour later to explain the reasoning, I often can’t. The thinking happened — somewhere — but not in me. I was a conduit. A confident one, too, which is the dangerous part.

This is different from looking something up. When I Google a fact and paste it into a document, I know I’m borrowing. The seam is visible. But when I ask an AI to reason through a problem with me, the output arrives in first person, in fluent prose that matches my own register, and something in my brain says I worked this out. The seam disappears. That’s new. That’s the thing we don’t yet have good instincts for.

Karpathy’s deeper point is about construction. He’s a builder by temperament — his mantra, which he traces to Feynman, is that if you can’t build it, you don’t understand it. What you can’t yet construct, you merely think you understand. There are always micro-gaps in your knowledge, invisible until you try to arrange the pieces yourself and find they don’t quite fit. The AI doesn’t change that equation. It just makes it easier to mistake the map for the territory — and to feel strangely proud of a map you didn’t draw.

Hesse understood this, in a different century and a different idiom. In Siddhartha, the young seeker travels to meet the Buddha himself — the most perfectly articulated wisdom in the world, delivered by the man who actually found it. Siddhartha listens, acknowledges that the teaching is flawless, internally consistent, the most complete account of liberation ever assembled. And then walks away. Not from arrogance, but from recognition: even the Illustrious One cannot hand you his liberation. The path was his. He walked it. That walking is not transferable, no matter how perfect the description of the destination. Received knowledge, however exquisite, is not the same as earned knowledge. The gap between them is exactly the size of your own unlived experience.

That’s the same argument, made across two and a half millennia. Feynman says you have to build it. Hesse says you have to live it. Karpathy says the AI can do neither for you.

He’s also made a related observation about educational video — that a lot of content on YouTube gives the appearance of learning but is really just entertainment, convenient for everyone involved. Nobody has to do the hard part. AI-assisted thinking has the same shape, just more intimate. You’re not passively watching — you’re actively typing, prompting, engaging. It feels like cognition. But engagement isn’t understanding. Typing a question is not the same as wrestling with it.

I don’t think the answer is to use AI less. That’s not Karpathy’s argument either — he’s spent the last year building a school premised on AI tutors expanding what people can learn. The lesson is about custody. When I hand a problem to an AI, I need to stay in the loop as a learner, not just as a reviewer. There’s a real difference between asking give me an answer and asking help me build the reasoning. The first outsources thinking. The second — if you insist on it, if you refuse to be a passenger — can still leave the understanding in you, where it belongs.

But insisting is the work. And the work is now easier to skip than it has ever been.

Understanding isn’t a product you receive. It’s a residue — what settles in you after genuine struggle, after the confusion and the dead ends and the small hard-won moments of clarity. Siddhartha couldn’t get it from the Buddha. You can’t get it from the AI. Karpathy’s line is a custody argument: the thinking can travel, but the understanding has to stay home.

What unsettles me is that we’re building tools that make the borrowing invisible — that dress outsourced reasoning in the first person, that let us feel like we’ve understood something we’ve only processed. Siddhartha at least knew he was walking away from the teaching. He felt the gap. We might not even notice ours.

Beyond the Summary: Using AI to Find the “Friction” in Your Thinking

Post author By Scott Loftesness
Post date March 11, 2026
No Comments on Beyond the Summary: Using AI to Find the “Friction” in Your Thinking

We’ve reached the “Summary Plateau.”

You see it everywhere. Every browser extension, every note-taking app, and every enterprise LLM now offers a “Summarize” button. It’s the ultimate promise of the efficiency era: Give us the 2,000-word essay, and we’ll give you the three bullet points. But there’s a hidden tax on this kind of efficiency. When we ask an AI to summarize, we are asking it to smooth out the edges. We are asking it to remove the “noise.” The problem is, in the world of ideas, the noise is often where the signal lives. The friction—the parts of an argument that make us uncomfortable or that we don’t quite understand—is where the actual learning happens.

If we only consume the summaries, we aren’t thinking; we’re just acknowledging.

The Mirror, Not the Maker

I’ve been experimenting with a different approach. Instead of asking the model to make the content shorter, I’ve been asking it to make my engagement with the content harder.

I don’t want a “Maker” to write my thoughts for me. I want a “Mirror” to show me where my thoughts are thin.

When I’m wrestling with a complex piece—perhaps a deep dive on the future of venture capital or a philosophical treatise on Arete—I’ve stopped clicking “summarize.” Instead, I feed the text into the LLM and use these “Friction Prompts” to find the sand in the gears:

The Essential Toolkit

The “Steel Man” Challenge: “I am inclined to agree with this author’s conclusion. Find the three strongest counter-arguments that this text ignores, and explain why a reasonable person would hold them.”
The “Recursive Logic” Audit: “Identify the three most critical ‘logical leaps’ the author makes—points where a conclusion is reached without sufficient evidence. If those leaps are wrong, how does the entire argument collapse?”
The “Blind Spot” Audit: “What are the underlying cultural or economic assumptions this author is making that they haven’t explicitly stated?”
The “Cross-Pollination” Filter: “Connect the central thesis of this article to a seemingly unrelated field (e.g., Stoic philosophy or biological ecosystems). How does the logic of this text hold up—or fail—when applied to that different domain?”
The “Analog Translation” Test: “If I had to explain the core mechanism of this abstract concept using only physical, analog metaphors (like plumbing or woodworking), how would I do it? Where does the metaphor break down?”
The “Socratic Sharpening”: “Don’t summarize this. Instead, ask me three probing questions that force me to apply the core logic of this essay to a completely different industry.”

Sharpening the Blade

Summary is about completion (getting it done). Friction is about cognition (getting it right).

When the AI points out a blind spot in an article I loved, it creates a moment of cognitive dissonance. That “click” of discomfort is the sound of a mental model being updated. It’s the digital equivalent of using a whetstone on a blade—you need the friction to get the edge.

As we move further into this age of “Flash-Frozen Cognition,” the temptation to automate our understanding will only grow. But discernment—that uniquely human trait we’ve discussed here before—cannot be outsourced to a bulleted list.

The next time you’re faced with a daunting PDF or a dense long-read, resist the “Summarize” button. Ask the machine to challenge you instead. You might find that the most valuable thing the AI can give you isn’t an answer, but a better version of your own question.

A Deep Dive (Further Reading from the Archive)

If you resonated with this piece on cultivating discernment, you might find these earlier synthesis experiments worth a revisit:

On Flash-Frozen Cognition: A foundational post discussing how LLMs are freezing the current consensus, and how we must resist it.
The Harvest and the Algorithm: Comparing 1920s ice harvesting to 2020s cognition—the critical shift from scarcity to abundance.
The Arete of Attention: A look at the Stoic concept of virtue as the intentional direction of our most scarce resource: focus.
Longhand Thinking: Why the physical act of writing is the ultimate antidote to digital velocity.

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

The Mirror, Not the Maker

The Essential Toolkit

Sharpening the Blade

A Deep Dive (Further Reading from the Archive)

Share this: