Categories
AI Podcasts

A Remarkable Conversation…

Highly recommend this conversation between Harry Stebbings and Clay Bavor. Among many topics, I especially enjoyed the discussion about not investing in frontier models, the important values, the particular importance of craftsmanship, intensity, and family. And the special conversation about parenting and kids near the end. Just a delightful conversation to be able to enjoy!

Categories
AI AI: Large Language Models Apple

The Slipstream Strategy

Apple had a problem no amount of money could solve. An iPhone can’t draw the power or shed the heat of a data center, so ten different tasks can’t mean ten different models fighting for the same sliver of RAM. Apple’s answer was to freeze one small, efficient base model into the device and then swap tiny adapters in and out of it in milliseconds — a summarization adapter for your texts, a Siri adapter for on-screen actions, and a handoff to Private Cloud Compute for anything heavier. The phone behaves like it’s running many models. It’s running one model wearing many hats.

That architecture — a frozen base plus swappable adapters — is quietly becoming the default way serious AI companies build, and it’s worth understanding why, because it inverts the assumption most people still carry into this industry.

The assumption is that winning means owning a frontier model. Sierra co-founder Clay Bavor pushed back on that on a recent 20VC episode: pouring capital into your own pre-training, he argued, tends to leave you holding a highly perishable bag of floating-point numbers. Open-weight models improve fast enough that yesterday’s frontier is next quarter’s commodity. The companies playing this well aren’t racing to out-spend the labs. They’re slipstreaming behind them — taking the free, state-of-the-art engine and putting all their effort into what sits on top of it.

What sits on top is LoRA — low-rank adaptation. The old failure mode was catastrophic forgetting: fine-tune a model hard enough on your own data and it forgets how to reason generally. LoRA sidesteps this by leaving the base model untouched and training a small set of additional parameters alongside it — a thin layer of expertise bolted onto a frozen foundation. You get real domain depth without touching the thing that makes the model work at all.

The business logic that follows from this is the actual point, and it’s simpler than it looks:

You stop being hostage to any one model provider — if a better open-weight model ships next month, you port your adapter, not your whole product. You can serve hundreds of differently-customized clients off one base model on one piece of hardware, instead of running a separate giant model per customer. You can ship a fix in an afternoon, because an adapter is a few hundred megabytes, not a training run. And in regulated industries, your proprietary data can train an adapter that never leaves your own infrastructure.

None of this is really a story about model architecture. It’s a story about where the moat moved. For a while the moat was raw capability — whoever had the best model won. Apple and Sierra are betting the moat is now somewhere else entirely: in how tightly you can weave a commodity intelligence into a specific workflow, a specific dataset, a specific customer relationship. The engine is free. The adapter is the business.

Categories
AI Apple Google

The Floor

I compared the frontier to a three-star chef making grilled cheese in “Context Rot” — the smartest models on earth spending most of their time on work beneath them, the way a chef trained at Le Bernardin might still melt cheese between two slices of bread on a Tuesday night and call it dinner. The comfort was the point: if the sharpest tool is saved for hard problems and something merely-very-good handles the rest, nobody’s losing anything. The floor was never the interesting part.

I’ve kept turning the joke over, and I think I had the wrong worry.

Watch what companies do with their AI spend, not what they say. Coinbase moved engineers off frontier models onto open weights and cut its AI spend nearly in half while usage kept climbing. Nvidia runs a closed model as orchestrator and routes the actual volume — the daily uncelebrated bulk of it — to open weights it controls. The frontier is becoming a dispatcher, deciding where the request goes and rarely doing the work itself. The instinct is to worry about whose open weights end up running that volume, and right now the most capable ones at scale are Chinese — GLM, Kimi — which makes it tempting to read this as a contest America is quietly losing: the floor of the AI economy built somewhere else, at a price export controls can’t touch. You cannot embargo a file already downloaded. You cannot price-match free.

But that framing has a hole. Google’s own Gemma family is open-weight and good enough to handle that daily volume without anyone reaching for GLM or Kimi. “Open weights are a Chinese story” only holds if you don’t count the open models the company running Android and half the internet’s search traffic has already shipped.

And once I saw that hole, a bigger one opened behind it. I’ve been trying Apple’s new Siri — arriving with iOS 27 this fall, genuinely surprisingly good in beta — and it made me realize open weights, of any nationality, were never going to cook most of the world’s dinners. Apple and Google are.

Consider what actually determines where the world’s routine inference runs. Not which model benchmarks best, not which weights are downloadable — what’s already installed. Apple ships to well over a billion active devices before routing a single query through Siri’s new architecture. Nobody has to be persuaded to try it, or hear about it on a podcast; it’s the thing that answers when you press the button you’ve pressed for a decade. Google owns the search bar and the Android default the same way. Between them, that’s most of the world’s phones — and phones are where most of the world’s questions get asked.

The open-weight framing assumes the floor is up for grabs, that whoever ships the best free model wins the daily grind by merit. But the floor was never a bazaar. It’s a set of defaults, owned by whoever already has the device in your hand, not whoever holds the most generous license. Apple didn’t need to win the model war to win this. Its heaviest reasoning tier is built with Google, running on Nvidia chips in Google’s cloud, under a deal reported at roughly a billion dollars a year — Apple doesn’t fully own the engine doing the thinking. It doesn’t need to. It owns the button.

That’s a quieter concentration than an export-controls fight, and a harder one to dislodge. An open model can be forked, distilled, undercut, or out-competed by the next release. A billion phones with an assistant built into the lock screen cannot be routed around. Whoever’s weights hum underneath barely matters, the way it barely matters to a diner which supplier delivered the flour. What matters is whose kitchen the meal came from, and whose name is on the door.

The grilled-cheese chef was never the risk. Two chefs are about to own nearly every kitchen on earth, and most of us will never notice — because a kitchen you’ve been eating out of for a decade doesn’t feel like something that was won. It just feels like home.

Owning the kitchen and getting paid for what’s cooked in it, though, turn out to be two different questions. That one’s for another post.

Categories
AI Consulting

The Judgment Layer

An analyst’s note about the CEO of one of the largest consulting companies making comments at an investor conference includes a line that deserves more attention than it got: “token volume used on a project isn’t a proxy for AI maturity.”

Translation — clients are burning money on frontier models for problems that don’t need frontier models, and they’re not getting the outcomes they expected.

This firm’s CEO offered this as a business opportunity. I read it as a confession.

The old consulting model was simple: client has a technology problem, firm deploys humans to solve it. Billing followed effort. The new problem is different in kind — clients have an AI strategy problem. They know they’re supposed to be using AI. They’ve heard the word “frontier.” They’re spending accordingly. They just don’t know why, and the outcomes are showing it.

So the CEO is right that there’s an opportunity here. The value proposition shifts from implementation to judgment — not deploying AI, but knowing when not to deploy the expensive one. Matching capability to problem. Being trusted enough to tell a client that their $50M frontier model contract is solving a $500K problem.

Here’s the irony that the comment skates past: that advice is structurally difficult for a large consultancy to give.

The business model that built consulting firms was billing for doing. The more you deploy, the more you bill. Helping a client spend less, or choose the cheaper model, or run a narrower project, is genuinely good advice that the incentive structure actively works against. You don’t grow a $70 billion professional services firm by talking clients out of scope.

The judgment layer, if it becomes the real value, requires something closer to a doctor’s relationship with a patient than a contractor’s relationship with a client. Doctors get paid whether they prescribe or not. The value of the visit is the diagnosis — including the diagnosis that says you don’t need the expensive intervention. Consultants, historically, get paid to prescribe, and paid more when the prescription is larger.

There’s a reason we trust doctors with that asymmetry and not contractors. Licensing, malpractice, professional norms built over centuries — all of it exists to align the incentive. Consulting has none of that infrastructure. What it has instead is reputation, which is slower-acting and easier to game.

Whether the large firms can actually make the shift — rather than just reframe the same billable-hours model in the language of AI optimization — is the real question the market is wrestling with. The CEO’s comment is genuinely perceptive about where client value lies. It’s less clear that consulting firms are currently built to capture it honestly.