Categories
AI AI: Large Language Models Apple

The Slipstream Strategy

Apple had a problem no amount of money could solve. An iPhone can’t draw the power or shed the heat of a data center, so ten different tasks can’t mean ten different models fighting for the same sliver of RAM. Apple’s answer was to freeze one small, efficient base model into the device and then swap tiny adapters in and out of it in milliseconds — a summarization adapter for your texts, a Siri adapter for on-screen actions, and a handoff to Private Cloud Compute for anything heavier. The phone behaves like it’s running many models. It’s running one model wearing many hats.

That architecture — a frozen base plus swappable adapters — is quietly becoming the default way serious AI companies build, and it’s worth understanding why, because it inverts the assumption most people still carry into this industry.

The assumption is that winning means owning a frontier model. Sierra co-founder Clay Bavor pushed back on that on a recent 20VC episode: pouring capital into your own pre-training, he argued, tends to leave you holding a highly perishable bag of floating-point numbers. Open-weight models improve fast enough that yesterday’s frontier is next quarter’s commodity. The companies playing this well aren’t racing to out-spend the labs. They’re slipstreaming behind them — taking the free, state-of-the-art engine and putting all their effort into what sits on top of it.

What sits on top is LoRA — low-rank adaptation. The old failure mode was catastrophic forgetting: fine-tune a model hard enough on your own data and it forgets how to reason generally. LoRA sidesteps this by leaving the base model untouched and training a small set of additional parameters alongside it — a thin layer of expertise bolted onto a frozen foundation. You get real domain depth without touching the thing that makes the model work at all.

The business logic that follows from this is the actual point, and it’s simpler than it looks:

You stop being hostage to any one model provider — if a better open-weight model ships next month, you port your adapter, not your whole product. You can serve hundreds of differently-customized clients off one base model on one piece of hardware, instead of running a separate giant model per customer. You can ship a fix in an afternoon, because an adapter is a few hundred megabytes, not a training run. And in regulated industries, your proprietary data can train an adapter that never leaves your own infrastructure.

None of this is really a story about model architecture. It’s a story about where the moat moved. For a while the moat was raw capability — whoever had the best model won. Apple and Sierra are betting the moat is now somewhere else entirely: in how tightly you can weave a commodity intelligence into a specific workflow, a specific dataset, a specific customer relationship. The engine is free. The adapter is the business.

Categories
AI

Context Rot

Here is a small, possibly embarrassing confession: I have never, not once, gone looking for the best AI model.

I have a model. It lives in a browser tab — Safari, usually, on whichever device is nearest, occasionally Chrome if I happen to be at the desktop. It does what I need — drafts an email, untangles a sentence, tells me what a Norwegian emigration record from 1856 probably says — and then I close the tab and go on a walk.

Somewhere out there, presumably, a much smarter, much more expensive machine is doing something extraordinary with protein folding or hedge fund arbitrage or the outer edges of mathematics I will never visit. I have made my peace with never meeting it.

This did not used to feel like a confession. For a while there — a year, eighteen months — it felt like the central drama of the whole industry: which model was “best,” who had it, who had lost it, whether some lab’s quarterly earnings call would reveal that the frontier had quietly moved sixty miles down the road while everyone was looking the other way. Benchmarks were released like box scores. People argued about them the way people argue about batting averages, with the same weird intensity, the same conviction that a two-point difference in some abstract reasoning test settled something important about the future.

And then, at some point I can’t quite date — it crept up, the way these things do — I noticed I had stopped caring.

Not because the frontier stopped moving. It didn’t. It’s still moving, arguably faster than ever, in ways that occasionally show up in the news with all the drama of a soap opera (a delayed launch, a researcher poached, a stock down five percent in an afternoon, always something).

I stopped caring because none of it touched me. My model — whatever it was, this week — had long since crossed some invisible threshold past which more didn’t register as more. It was already better than I needed. It has been better than I needed for a while now. I suspect I am not unusual in this. I suspect most people, doing most things, most days, are operating comfortably inside a capability surplus so large they’ve stopped noticing it’s there, the way you stop noticing a room is warm.

If the top of the model isn’t for people like me — and it increasingly isn’t — then who, or what, is it actually for? I went looking for one piece of the answer and found, instead, a metaphor.

It’s called “context rot.” I have to admit, before I go further, that I’m not sure I’ve ever felt it myself — which, on reflection, is its own small piece of evidence. My sessions close in minutes, not hours. I ask, it answers, I leave. Whatever happens to a model over the fourth or fifth hour of sustained, dependent work is a country I simply don’t visit.

But other people do, increasingly — entire teams do, for entire projects — and what they’re finding out there is worth understanding, even secondhand. It describes something that happens to AI models when they’re asked to work for a long time on something complicated — not five minutes, but five hours; not one question, but a hundred small decisions stacked on top of each other, each one depending on the last.

You’d think the limiting factor would be room. Models have a “context window” — a stated capacity, like a gas tank, measured in tokens, and for a while the marketing numbers on these were the whole story: two million tokens! A library! And you’d think, as with a gas tank, that the thing runs fine until it’s empty and then it stops.

That is not, it turns out, what happens. What happens is closer to what happens to your desk.

You know the desk. Everyone has the desk. It starts the morning clean — an aspirational, almost insulting cleanliness — and by four in the afternoon it is a geological record of the day: three coffee cups, a stack of things you meant to file, a Post-it with a phone number you no longer need, the good pen buried under a printout of something you already dealt with an hour ago. The desk is not full. There is, technically, room. You could clear a space if you tried. But you don’t try, because functionally, cognitively, the desk has stopped being usable long before it ran out of surface area. You start looking for the stapler and forget what you were stapling. This — and I did not make this term up, I want to be clear, though I wish I had — is context rot. The window hasn’t run out. The signal has just drowned in its own debris.

Researchers watching this happen to long-running AI agents have found something almost cruelly elegant about how it fails: it doesn’t fail gradually, the way you’d expect a desk to get gradually messier. Errors compound. A task that takes twice as long doesn’t get twice as likely to go wrong — the failure rate roughly quadruples. Two mistakes early in a long chain of dependent steps don’t add up to a slightly worse outcome. They multiply into something close to total collapse, four hours in, for reasons that trace back to a single bad assumption made in the first twenty minutes and never revisited.

Here is where the frontier comes back in — not as the whole answer, but as a piece of one.

It is not that frontier models are smarter in the way a benchmark measures smart — better at a single hard math problem, a cleverer turn of reasoning. Plenty of models can do that now; the “good enough” tier has crept remarkably high.

It’s that frontier models are apparently, marginally, meaningfully better at not rotting. At keeping the desk usable at hour six. At knowing which of the forty things on the desk actually still matters and which is a coffee cup that should have been thrown out an hour ago. This is a genuinely different kind of intelligence than the one benchmarks were built to measure, and it is almost invisible from the outside — you don’t see it in a single exchange, you see it only in the difference between a project that holds together over three days and one that quietly, subtly, stops making sense somewhere around Tuesday afternoon and nobody notices until Thursday.

If that’s true — if the frontier’s real edge is durability rather than raw cleverness — you’d expect to see it show up in how the labs actually deploy their own models: saving the sharpest tools for the tasks that need to survive the longest.

I went looking for a real-world example and found one closer to home than I expected: Anthropic’s own Slack tool, the one where you tag the AI into a channel the way you’d tag a coworker, and it works alongside a whole team over days, learning the channel as it goes. It runs on a serious, capable, thoroughly frontier model — but not, it turns out, on the company’s very best one. That one is held back, reserved for a smaller and stranger set of problems nobody has solved before at all. I sat with that for a while. The tool built to survive a whole team’s whole week, in public, under the most sustained pressure any of their products face, wasn’t handed the sharpest blade in the drawer. It was handed the second-sharpest — which was apparently, entirely, enough. Which tells you something about where the two kinds of intelligence actually diverge: the merely-very-good model handles the desk staying clean for a week, in public, in front of a whole team, where one bad assumption made Monday and never revisited would be visible to everyone by Thursday. The truly new capability is being held in reserve for something else altogether.

I don’t have a tidy place to land this, and I’m suspicious of anyone who does. But here’s the closest I can get.

Imagine a three-Michelin-star chef — the kind of person who has spent thirty years learning to coax something transcendent out of a single scallop, who can tell you, by smell, that a stock has forty more minutes in it — standing at your stove on a Tuesday night making you a grilled cheese sandwich. It will, I promise you, be a very good grilled cheese sandwich. The bread will be evenly golden. The cheese will have reached some ideal, fully-considered state of melt. But almost none of what makes that chef extraordinary is actually being used to make it — none of the thirty years spent learning to hold forty things in mind at once without losing track of any of them, the exact skill, it occurs to me, that keeps a long, complicated project from quietly falling apart on day three. The technique is idling. The thirty years are in the room, present, available, and almost entirely beside the point, because a grilled cheese sandwich was never the place where thirty years shows up. It shows up somewhere else — in a dish you will never order, on a night you weren’t there.

What you got instead, on your ordinary Tuesday, was simply more than enough.

Categories
AI Consulting

The Judgment Layer

An analyst’s note about the CEO of one of the largest consulting companies making comments at an investor conference includes a line that deserves more attention than it got: “token volume used on a project isn’t a proxy for AI maturity.”

Translation — clients are burning money on frontier models for problems that don’t need frontier models, and they’re not getting the outcomes they expected.

This firm’s CEO offered this as a business opportunity. I read it as a confession.

The old consulting model was simple: client has a technology problem, firm deploys humans to solve it. Billing followed effort. The new problem is different in kind — clients have an AI strategy problem. They know they’re supposed to be using AI. They’ve heard the word “frontier.” They’re spending accordingly. They just don’t know why, and the outcomes are showing it.

So the CEO is right that there’s an opportunity here. The value proposition shifts from implementation to judgment — not deploying AI, but knowing when not to deploy the expensive one. Matching capability to problem. Being trusted enough to tell a client that their $50M frontier model contract is solving a $500K problem.

Here’s the irony that the comment skates past: that advice is structurally difficult for a large consultancy to give.

The business model that built consulting firms was billing for doing. The more you deploy, the more you bill. Helping a client spend less, or choose the cheaper model, or run a narrower project, is genuinely good advice that the incentive structure actively works against. You don’t grow a $70 billion professional services firm by talking clients out of scope.

The judgment layer, if it becomes the real value, requires something closer to a doctor’s relationship with a patient than a contractor’s relationship with a client. Doctors get paid whether they prescribe or not. The value of the visit is the diagnosis — including the diagnosis that says you don’t need the expensive intervention. Consultants, historically, get paid to prescribe, and paid more when the prescription is larger.

There’s a reason we trust doctors with that asymmetry and not contractors. Licensing, malpractice, professional norms built over centuries — all of it exists to align the incentive. Consulting has none of that infrastructure. What it has instead is reputation, which is slower-acting and easier to game.

Whether the large firms can actually make the shift — rather than just reframe the same billable-hours model in the language of AI optimization — is the real question the market is wrestling with. The CEO’s comment is genuinely perceptive about where client value lies. It’s less clear that consulting firms are currently built to capture it honestly.