Apple had a problem no amount of money could solve. An iPhone can’t draw the power or shed the heat of a data center, so ten different tasks can’t mean ten different models fighting for the same sliver of RAM. Apple’s answer was to freeze one small, efficient base model into the device and then swap tiny adapters in and out of it in milliseconds — a summarization adapter for your texts, a Siri adapter for on-screen actions, and a handoff to Private Cloud Compute for anything heavier. The phone behaves like it’s running many models. It’s running one model wearing many hats.
That architecture — a frozen base plus swappable adapters — is quietly becoming the default way serious AI companies build, and it’s worth understanding why, because it inverts the assumption most people still carry into this industry.
The assumption is that winning means owning a frontier model. Sierra co-founder Clay Bavor pushed back on that on a recent 20VC episode: pouring capital into your own pre-training, he argued, tends to leave you holding a highly perishable bag of floating-point numbers. Open-weight models improve fast enough that yesterday’s frontier is next quarter’s commodity. The companies playing this well aren’t racing to out-spend the labs. They’re slipstreaming behind them — taking the free, state-of-the-art engine and putting all their effort into what sits on top of it.
What sits on top is LoRA — low-rank adaptation. The old failure mode was catastrophic forgetting: fine-tune a model hard enough on your own data and it forgets how to reason generally. LoRA sidesteps this by leaving the base model untouched and training a small set of additional parameters alongside it — a thin layer of expertise bolted onto a frozen foundation. You get real domain depth without touching the thing that makes the model work at all.
The business logic that follows from this is the actual point, and it’s simpler than it looks:
You stop being hostage to any one model provider — if a better open-weight model ships next month, you port your adapter, not your whole product. You can serve hundreds of differently-customized clients off one base model on one piece of hardware, instead of running a separate giant model per customer. You can ship a fix in an afternoon, because an adapter is a few hundred megabytes, not a training run. And in regulated industries, your proprietary data can train an adapter that never leaves your own infrastructure.
None of this is really a story about model architecture. It’s a story about where the moat moved. For a while the moat was raw capability — whoever had the best model won. Apple and Sierra are betting the moat is now somewhere else entirely: in how tightly you can weave a commodity intelligence into a specific workflow, a specific dataset, a specific customer relationship. The engine is free. The adapter is the business.
You must be logged in to post a comment.