Tag: Tech History

The Half-Life of Moats

Prompted by an article on X by @magicsilicon on the CUDA moat. Research and drafting assistance from my AI intern assistant Clark.

The NVIDIA H100 looks, in retrospect, like an inevitability. It wasn’t.

What Jensen Huang built is more accurately understood as a sixteen-year accumulation of optionality — a platform investment made in 2006 for a market that wouldn’t fully materialize until 2022. NVIDIA intros the G80 architecture in November 2006, laying the groundwork for CUDA’s release a few months later. The stated ambition was to let scientists write C++ that ran on GPU cores without needing to understand 3D graphics pipelines. The unstated bet was that parallel computation would eventually matter for something bigger than rendering shadows in video games.

For sixteen years, it mostly didn’t. Not at scale. Not commercially. CUDA lived in research labs and HPC clusters. It attracted a small, devoted, and economically marginal user base — the kind that papers cite but investors ignore. NVIDIA kept investing in it anyway: cuDNN for deep learning operations, cuBLAS for linear algebra, a layered ecosystem of libraries that made CUDA not just accessible but nearly irreplaceable for anyone doing serious numerical computation. When TensorFlow and PyTorch emerged as the standard frameworks for neural network research, they didn’t adopt CUDA because it was the only option. They adopted it because CUDA was where the optimized kernels already lived.

AlexNet won the ImageNet competition in 2012 and did it on two NVIDIA GPUs. The deep learning community noticed immediately. The financial community largely did not.

Then ChatGPT launched in November 2022, and suddenly everyone needed H100s they couldn’t get.

The parallel to Intel is instructive and also undersells how strange this kind of story looks while you’re living through it. Intel was founded in 1968 as a memory company. DRAM. The founders — Noyce, Moore, Grove — were materials scientists and engineers who believed the future was in silicon memory chips. They were right, briefly: in the early 1970s Intel dominated the DRAM market. By 1984, that share had collapsed to 1.3%, ceded almost entirely to Japanese manufacturers who had commoditized the product.

What saved Intel wasn’t a pivot so much as a realization that a stopgap had become a foundation. The 8086, conceived in 1976 as an internal hedge and launched in 1978 was never supposed to matter. It was a 16-bit processor designed to hold off Zilog while Intel finished its ambitious 32-bit iAPX 432 architecture. The 8086 was assigned to a single engineer. “If management had any inkling that this architecture would live on through many generations,” its designer Stephen Morse later recalled, “they never would have trusted this task to a single person.”

IBM chose the 8088 — a cost-reduced variant — for the original IBM PC in 1981. That decision wasn’t destiny, it was simply a procurement. And yet from that accident of selection, Intel’s x86 line became the backbone of personal computing for four decades. The Pentium in 1993 was Intel’s Wintel moment — the flag bearer the @magicsilicon tweet gestures at — but the flag had been quietly sewn since 1978.

What these histories share is not just a pattern of “slow build, explosive payoff.” The structural similarity is subtler: in both cases, the moat was a software abstraction layer built on top of hardware. Intel’s real lock-in wasn’t transistor count or clock speed. It was backward compatibility — the commitment, formalized with the 80386 in 1985, that every future Intel chip would run software written for older ones. That promise created a flywheel that trapped developers and buyers in a virtuous (for Intel) dependency loop for decades.

CUDA is the same architecture at a different layer. The lock-in isn’t the H100’s 80 gigabytes of HBM3. It’s that switching to an AMD MI300X or Google TPU means potentially rewriting training pipelines that have been optimized against CUDA kernels for years. AMD’s ROCm platform exists. It is, by most accounts, maturing. Engineers who have tried the migration report that it costs months and hundreds of thousands of dollars. The moat isn’t a wall. It’s accumulated friction — the switching cost of a decade of engineering decisions baked into codebases that no one wants to touch.

But to find the actual origin of this pattern, you have to go back further than Intel. To 1964, and to a decision IBM made that Fred Brooks — its project manager — called a bet-the-business move.

The IBM System/360 was announced on April 7, 1964, after five years of turbulent internal development. What it introduced wasn’t just a new computer. It was a new concept: the separation of architecture from implementation. Before the 360, IBM ran five incompatible product lines simultaneously. A customer who outgrew their machine had to scrap all existing software and start over. The 360 replaced all five lines with a single unified architecture — six models covering a fiftyfold performance range, all running the same operating system, all sharing the same instruction set. The name itself encoded the ambition: 360 degrees, all directions, all users.

Gene Amdahl, the 360’s chief architect, had a precise formulation for what this meant: the architecture was “an interface for which software is written, independent of any implementation.” The Principles of Operation manual described what the machine did; separate Functional Characteristics documents described how each model did it. This distinction — separating the contract from the execution — was genuinely new. It’s the conceptual root of everything that came after.

The 360 generated over $100 billion in revenue for IBM and established the first platform business model in computing. Jim Collins would later rank it alongside the Model T and the Boeing 707 as one of the three greatest business achievements of the twentieth century. But its deepest legacy was architectural: the insight that if you make your abstraction layer the standard, the hardware underneath becomes fungible. Customers didn’t buy specific IBM machines. They bought into OS/360. The machines were an implementation detail.

Intel understood this by the 1980s, even if implicitly. The 80386’s backward compatibility commitment in 1985 was IBM’s 360 insight applied to microprocessors — the architecture is the product, the silicon is the vehicle. CUDA is the same insight applied to GPU compute. What NVIDIA sold researchers in 2006 wasn’t the G80 card. It was the abstraction: write parallel code in C++, run it on any NVIDIA hardware, trust that the next generation will be faster and compatible.

The pattern is now sixty years old. It has reproduced in every major platform transition. And it keeps working for the same reason it worked in 1964: when you own the layer that developers write to, your customers’ switching costs compound every year they stay.

There’s something worth sitting with here. Neither Jensen Huang in 2006 nor Gordon Moore in 1968 could have specified exactly what the payoff would look like. What they shared was a willingness to build infrastructure for a demand they could sense but not yet see — and the discipline to keep investing in it through the long years when it looked like a research project rather than a business.

The question that doesn’t resolve cleanly is whether that kind of patience is a strategy or a personality. And whether, in an industry that now moves faster than the cycles it’s lived through, sixteen-year moats are still the kind that get built.

Which raises the uncomfortable corollary: the same AI tools that CUDA enabled may be what ultimately erodes it.

The attack on CUDA’s moat is now structurally different from anything AMD or Intel could mount before. OpenAI’s Triton compiler lets developers write GPU kernels in Python without touching CUDA at all, and generates optimized machine code that often matches hand-tuned CUDA performance. MLIR — Multi-Level Intermediate Representation, originally from Google — provides a compiler infrastructure that can target any hardware backend from a single codebase. AMD’s ROCm has historically been dismissed as immature; ROCm 7, released this year, delivers meaningfully better inference performance than its predecessors. And perhaps most directly: Claude Code reportedly ported a CUDA codebase to AMD’s ROCm in thirty minutes — work that previously took months of engineering time.

The irony is almost too neat. CUDA’s moat was built on accumulated switching costs: the friction of rewriting code, the library dependencies, the tribal knowledge encoded in a decade of kernel optimizations. AI coding tools are specifically good at exactly that kind of mechanical, high-context translation. The weapon is attacking the wall it was built behind.

That said, it’s worth being careful about the speed of this. Abstraction layers that “should” erode moats often take far longer than expected, because the moat isn’t just the code — it’s the ecosystem of tooling, documentation, community knowledge, and hardware-software co-optimization that took eighteen years to compound. Triton and MLIR are real. They’re also early. The question isn’t whether the moat is vulnerable; it’s whether it erodes before NVIDIA’s next generation of chips makes it irrelevant to argue about.

As for what comes next — which company is building the IBM 360 of this decade — the honest answer is that it’s too early to call with confidence. But there’s a candidate worth watching.

Anthropic’s Model Context Protocol, launched in late 2024, has the structural fingerprint of a platform play. MCP is a standard for how AI agents connect to external tools and data sources — a common interface layer, hardware-agnostic (or rather, model-agnostic), that any system can implement. By late 2025 it had been donated to the Linux Foundation, adopted by OpenAI and Google, and was tracking 97 million monthly SDK downloads. There are now over 10,000 MCP servers. It is becoming the way agents talk to the world.

The parallel to OS/360 is imprecise but instructive. What IBM built in 1964 was a standard interface between software and hardware that decoupled what you wrote from what you ran it on. MCP is attempting something similar one abstraction layer higher: decoupling what an agent does from the specific models, tools, and data sources it does it with. If it becomes the standard — the layer that developers write to — then whoever owns or most deeply shapes that standard controls the integration tax of an industry whose applications we can’t fully specify yet.

The counterargument is that open standards, once donated to foundations and broadly adopted, don’t generate the same lock-in as proprietary platforms. OS/360 was IBM’s. CUDA is NVIDIA’s. MCP is now the Linux Foundation’s, with OpenAI and Google as co-stewards. The historical pattern suggests the moat accrues to whoever owns the layer, not whoever invented it.

Which may mean the next great platform play is still being assembled in a room we haven’t seen yet — the way IBM’s System/360 was being architected in a Connecticut motor lodge in 1961, three years before anyone else knew what was coming.

Tags abstraction layers, AI accelerators, AI boom, AI hardware, AI Infrastructure, AI investment, AlexNet, AMD ROCm, Blackwell, chatgpt, chip architecture, computing platforms, CUDA, cuDNN, Deep Learning, developer ecosystems, emerging technology, foundation models, Fred Brooks, generative ai, Gordon Moore, GPU computing, H100, hardware abstraction, Hopper architecture, IBM mainframe, IBM System/360, Inference, Intel, jensen huang, Linux Foundation, MCP, MLIR, Model Context Protocol, nvidia, open standards, OpenAI Triton, parallel computing, PC revolution, physical AI, platform strategy, PyTorch, Scott Loftesness, semiconductor history, software ecosystems, switching costs, Tech History, tech strategy, technology moats, TensorFlow, Venture Capital, x86

Authors

Tracy Kidder and the Human Code

Post author By Scott Loftesness
Post date March 25, 2026
No Comments on Tracy Kidder and the Human Code

Tracy Kidder died yesterday, March 24th, of lung cancer. He was 80.

I’ve been sitting with that quiet, heavy fact for a few hours now, staring at the screen, thinking about what his work meant to me—and specifically, about the enduring legacy of The Soul of a New Machine.

On its surface, the book is a chronicle of a team of engineers and coders at Data General Corporation, racing against the clock in the early 1980s to build a 32-bit minicomputer. If you haven’t read it, that description likely sounds like the synopsis for a dry technical manual. It is, gloriously, anything but.

What Kidder did—what hit me with such force when I first turned those pages—was capture the raw, unvarnished pulse of human obsession. He didn’t just document the architecture of a machine; he mapped the architecture of the minds building it. He translated the late-night pizza runs, the bloodshot eyes, the tribal hierarchies of the engineering floor, and the strange, almost religious fervor that overtakes people when they are creating something they profoundly believe in.

He called it:

“An adventure story, a kind of cultural anthropology.”

That is exactly right.

He ventured into a world most journalists would have fumbled or fundamentally misunderstood.

The early computer industry was hyper-technical, fiercely insular, full of impenetrable jargon, and populated by brilliant minds who regarded outsiders with a polite, if dismissive, suspicion.

But Kidder didn’t blink. He embedded himself. His deep reporting and novelistic prose illuminated the basement labs of tech just as deftly as he later illuminated home construction and global disease prevention. He held a fundamental trust that the human drama playing out inside the sterile machine room was worth finding. And he found it.

Reading Soul as someone who has spent years orbiting technology, I continually find myself marveling at a different kind of engineering: how does a writer actually do this? How do you make the arcane feel intimate?

As one reviewer aptly noted at the time, “Kidder makes the telling seem absolutely effortless.” Which is, of course, the ultimate tell. Effortless prose is always the product of staggering effort.

A friend once said of his process:

“Tracy throws up on the page and cleans up afterward. He was absolutely indefatigable in the writing.”

That immense labor shows—not as the sweat of a struggling author, but as the pure clarity of a master.

What the book quietly teaches, if you’re paying attention, is a profound lesson about the nature of craft itself.

Those Data General engineers weren’t just building a minicomputer. They were building an identity, a tribe, a shared sense of purpose. They were transferring a piece of themselves into the silicon and wire. Kidder understood this alchemy. He highlighted people who had mastered their realms, elevating them into characters whose struggles rang true because they were anchored by staggering amounts of research. He believed—and subsequently proved to the world—that ordinary people doing terribly difficult things in obscure rooms were worthy of the full weight of literary attention.

That was his extraordinary gift. And it is far rarer than it sounds.

The honors and brisk sales from the book vaulted Kidder into the top ranks of American nonfiction writers. But his true legacy lives in the narrative talents he inspired. I suspect a vast number of people who went on to write serious, empathetic nonfiction about technology read Soul at some formative moment and thought: This is how it should be done. I know I was one of them.

He will be deeply missed. But the book remains, waiting on the shelf. If you haven’t read it, today feels like exactly the right day to start.

AI Web/Tech

Why the AI PC is the New 3D TV

Post author By Scott Loftesness
Post date January 8, 2026
No Comments on Why the AI PC is the New 3D TV

A close-up of a laptop showing an 'AI READY' sticker on its surface, alongside a pair of glasses, a coffee mug, and a notepad on a wooden desk.

I was reading the coverage coming out of CES 2026 this week, and the silence was deafening. Just a year ago, the industry was shouting about the “AI PC” as the inevitable successor to the computing throne. Every laptop lid, keyboard deck, and press release was plastered with the promise of Neural Processing Units (NPUs) and local intelligence.

But looking at the tepid market reaction—and Dell explicitly dialing back the “AI sermon” this year—I can’t help but feel a sense of déjà vu. It reminds me of the “3D Ready” stickers that adorned every television set circa 2011.

There is a distinct pattern in consumer technology where the hardware cart gets placed miles ahead of the software horse. We saw it with 3D televisions, a technology that demanded we wear goofy glasses to watch a limited library of content, offering a friction-heavy solution to a problem nobody really had. We saw it, more tragically, with Apple’s Vision Pro. Despite being a marvel of engineering, it stalled because it asked too much of us (financial and physical weight) for too little return in our daily lives.

The “AI PC” seems to be falling into a similar, albeit subtler, trap.

The issue isn’t that AI is a fad—far from it. Agentic AI and local models are transforming how we work. The issue is the marketing category. Consumers are realizing that an “AI PC” is just… a PC. The magic of AI isn’t in the hardware badge or a dedicated Copilot key; it’s in the software that runs anywhere. We are realizing that we don’t buy “Internet PCs” anymore, we just buy computers. The utility is ubiquitous, not proprietary to a specific chassis.

When technology truly succeeds, it disappears. It becomes boring. The “flop” of the AI PC isn’t a failure of technology, but a failure of hype. It is the market collectively shrugging and saying, “Show me the value, not the specs.” Until the software experiences are so undeniable that we can’t live without that local NPU, the “AI PC” will remain a marketing sticker, destined to peel off and fade away, much like 3D glasses or Vision Pros gathering dust for those few who bought them.

Share this:

Share this:

Share this: