Categories
Business History IBM Infrastructure Nvidia Programming Semiconductors

The Half-Life of Moats

Prompted by an article on X by @magicsilicon on the CUDA moat. Research and drafting assistance from my AI intern assistant Clark.

The NVIDIA H100 looks, in retrospect, like an inevitability. It wasnโ€™t.

What Jensen Huang built is more accurately understood as a sixteen-year accumulation of optionality โ€” a platform investment made in 2006 for a market that wouldnโ€™t fully materialize until 2022. NVIDIA intros the G80 architecture in November 2006, laying the groundwork for CUDAโ€™s release a few months later. The stated ambition was to let scientists write C++ that ran on GPU cores without needing to understand 3D graphics pipelines. The unstated bet was that parallel computation would eventually matter for something bigger than rendering shadows in video games.

For sixteen years, it mostly didnโ€™t. Not at scale. Not commercially. CUDA lived in research labs and HPC clusters. It attracted a small, devoted, and economically marginal user base โ€” the kind that papers cite but investors ignore. NVIDIA kept investing in it anyway: cuDNN for deep learning operations, cuBLAS for linear algebra, a layered ecosystem of libraries that made CUDA not just accessible but nearly irreplaceable for anyone doing serious numerical computation. When TensorFlow and PyTorch emerged as the standard frameworks for neural network research, they didnโ€™t adopt CUDA because it was the only option. They adopted it because CUDA was where the optimized kernels already lived.

AlexNet won the ImageNet competition in 2012 and did it on two NVIDIA GPUs. The deep learning community noticed immediately. The financial community largely did not.

Then ChatGPT launched in November 2022, and suddenly everyone needed H100s they couldnโ€™t get.


The parallel to Intel is instructive and also undersells how strange this kind of story looks while youโ€™re living through it. Intel was founded in 1968 as a memory company. DRAM. The founders โ€” Noyce, Moore, Grove โ€” were materials scientists and engineers who believed the future was in silicon memory chips. They were right, briefly: in the early 1970s Intel dominated the DRAM market. By 1984, that share had collapsed to 1.3%, ceded almost entirely to Japanese manufacturers who had commoditized the product.

What saved Intel wasnโ€™t a pivot so much as a realization that a stopgap had become a foundation. The 8086, conceived in 1976 as an internal hedge and launched in 1978 was never supposed to matter. It was a 16-bit processor designed to hold off Zilog while Intel finished its ambitious 32-bit iAPX 432 architecture. The 8086 was assigned to a single engineer. โ€œIf management had any inkling that this architecture would live on through many generations,โ€ its designer Stephen Morse later recalled, โ€œthey never would have trusted this task to a single person.โ€

IBM chose the 8088 โ€” a cost-reduced variant โ€” for the original IBM PC in 1981. That decision wasnโ€™t destiny, it was simply a procurement. And yet from that accident of selection, Intelโ€™s x86 line became the backbone of personal computing for four decades. The Pentium in 1993 was Intelโ€™s Wintel moment โ€” the flag bearer the @magicsilicon tweet gestures at โ€” but the flag had been quietly sewn since 1978.


What these histories share is not just a pattern of โ€œslow build, explosive payoff.โ€ The structural similarity is subtler: in both cases, the moat was a software abstraction layer built on top of hardware. Intelโ€™s real lock-in wasnโ€™t transistor count or clock speed. It was backward compatibility โ€” the commitment, formalized with the 80386 in 1985, that every future Intel chip would run software written for older ones. That promise created a flywheel that trapped developers and buyers in a virtuous (for Intel) dependency loop for decades.

CUDA is the same architecture at a different layer. The lock-in isnโ€™t the H100โ€™s 80 gigabytes of HBM3. Itโ€™s that switching to an AMD MI300X or Google TPU means potentially rewriting training pipelines that have been optimized against CUDA kernels for years. AMDโ€™s ROCm platform exists. It is, by most accounts, maturing. Engineers who have tried the migration report that it costs months and hundreds of thousands of dollars. The moat isnโ€™t a wall. Itโ€™s accumulated friction โ€” the switching cost of a decade of engineering decisions baked into codebases that no one wants to touch.


But to find the actual origin of this pattern, you have to go back further than Intel. To 1964, and to a decision IBM made that Fred Brooks โ€” its project manager โ€” called a bet-the-business move.

The IBM System/360 was announced on April 7, 1964, after five years of turbulent internal development. What it introduced wasnโ€™t just a new computer. It was a new concept: the separation of architecture from implementation. Before the 360, IBM ran five incompatible product lines simultaneously. A customer who outgrew their machine had to scrap all existing software and start over. The 360 replaced all five lines with a single unified architecture โ€” six models covering a fiftyfold performance range, all running the same operating system, all sharing the same instruction set. The name itself encoded the ambition: 360 degrees, all directions, all users.

Gene Amdahl, the 360โ€™s chief architect, had a precise formulation for what this meant: the architecture was โ€œan interface for which software is written, independent of any implementation.โ€ The Principles of Operation manual described what the machine did; separate Functional Characteristics documents described how each model did it. This distinction โ€” separating the contract from the execution โ€” was genuinely new. Itโ€™s the conceptual root of everything that came after.

The 360 generated over $100 billion in revenue for IBM and established the first platform business model in computing. Jim Collins would later rank it alongside the Model T and the Boeing 707 as one of the three greatest business achievements of the twentieth century. But its deepest legacy was architectural: the insight that if you make your abstraction layer the standard, the hardware underneath becomes fungible. Customers didnโ€™t buy specific IBM machines. They bought into OS/360. The machines were an implementation detail.

Intel understood this by the 1980s, even if implicitly. The 80386โ€™s backward compatibility commitment in 1985 was IBMโ€™s 360 insight applied to microprocessors โ€” the architecture is the product, the silicon is the vehicle. CUDA is the same insight applied to GPU compute. What NVIDIA sold researchers in 2006 wasnโ€™t the G80 card. It was the abstraction: write parallel code in C++, run it on any NVIDIA hardware, trust that the next generation will be faster and compatible.

The pattern is now sixty years old. It has reproduced in every major platform transition. And it keeps working for the same reason it worked in 1964: when you own the layer that developers write to, your customersโ€™ switching costs compound every year they stay.


Thereโ€™s something worth sitting with here. Neither Jensen Huang in 2006 nor Gordon Moore in 1968 could have specified exactly what the payoff would look like. What they shared was a willingness to build infrastructure for a demand they could sense but not yet see โ€” and the discipline to keep investing in it through the long years when it looked like a research project rather than a business.

The question that doesnโ€™t resolve cleanly is whether that kind of patience is a strategy or a personality. And whether, in an industry that now moves faster than the cycles itโ€™s lived through, sixteen-year moats are still the kind that get built.


Which raises the uncomfortable corollary: the same AI tools that CUDA enabled may be what ultimately erodes it.

The attack on CUDAโ€™s moat is now structurally different from anything AMD or Intel could mount before. OpenAIโ€™s Triton compiler lets developers write GPU kernels in Python without touching CUDA at all, and generates optimized machine code that often matches hand-tuned CUDA performance. MLIR โ€” Multi-Level Intermediate Representation, originally from Google โ€” provides a compiler infrastructure that can target any hardware backend from a single codebase. AMDโ€™s ROCm has historically been dismissed as immature; ROCm 7, released this year, delivers meaningfully better inference performance than its predecessors. And perhaps most directly: Claude Code reportedly ported a CUDA codebase to AMDโ€™s ROCm in thirty minutes โ€” work that previously took months of engineering time.

The irony is almost too neat. CUDAโ€™s moat was built on accumulated switching costs: the friction of rewriting code, the library dependencies, the tribal knowledge encoded in a decade of kernel optimizations. AI coding tools are specifically good at exactly that kind of mechanical, high-context translation. The weapon is attacking the wall it was built behind.

That said, itโ€™s worth being careful about the speed of this. Abstraction layers that โ€œshouldโ€ erode moats often take far longer than expected, because the moat isnโ€™t just the code โ€” itโ€™s the ecosystem of tooling, documentation, community knowledge, and hardware-software co-optimization that took eighteen years to compound. Triton and MLIR are real. Theyโ€™re also early. The question isnโ€™t whether the moat is vulnerable; itโ€™s whether it erodes before NVIDIAโ€™s next generation of chips makes it irrelevant to argue about.


As for what comes next โ€” which company is building the IBM 360 of this decade โ€” the honest answer is that itโ€™s too early to call with confidence. But thereโ€™s a candidate worth watching.

Anthropicโ€™s Model Context Protocol, launched in late 2024, has the structural fingerprint of a platform play. MCP is a standard for how AI agents connect to external tools and data sources โ€” a common interface layer, hardware-agnostic (or rather, model-agnostic), that any system can implement. By late 2025 it had been donated to the Linux Foundation, adopted by OpenAI and Google, and was tracking 97 million monthly SDK downloads. There are now over 10,000 MCP servers. It is becoming the way agents talk to the world.

The parallel to OS/360 is imprecise but instructive. What IBM built in 1964 was a standard interface between software and hardware that decoupled what you wrote from what you ran it on. MCP is attempting something similar one abstraction layer higher: decoupling what an agent does from the specific models, tools, and data sources it does it with. If it becomes the standard โ€” the layer that developers write to โ€” then whoever owns or most deeply shapes that standard controls the integration tax of an industry whose applications we canโ€™t fully specify yet.

The counterargument is that open standards, once donated to foundations and broadly adopted, donโ€™t generate the same lock-in as proprietary platforms. OS/360 was IBMโ€™s. CUDA is NVIDIAโ€™s. MCP is now the Linux Foundationโ€™s, with OpenAI and Google as co-stewards. The historical pattern suggests the moat accrues to whoever owns the layer, not whoever invented it.

Which may mean the next great platform play is still being assembled in a room we havenโ€™t seen yet โ€” the way IBMโ€™s System/360 was being architected in a Connecticut motor lodge in 1961, three years before anyone else knew what was coming.

Categories
AI

The Ghost of Edison in the AI Data Center

For over a century, the story of modern electricity has been framed by the “War of the Currents.” Thomas Edison championed Direct Current (DC)โ€”a stable, continuous flow of energyโ€”while Nikola Tesla and George Westinghouse backed Alternating Current (AC), which could be easily stepped up in voltage to travel long distances across the grid.

Tesla won. AC became the lifeblood of the global power grid. But history has a funny way of looping back on itself. Today, as we stand on the precipice of the largest infrastructure build-out in human historyโ€”the artificial intelligence data centerโ€”Edisonโ€™s DC power is making a quiet, monumental comeback.

The catalyst? The sheer, unyielding physics of energy consumption.

The AI boom, driven by massive GPU clusters from companies like NVIDIA, is extraordinarily power-hungry. We are no longer measuring data center power in megawatts; we are measuring it in gigawatts. And when you are dealing with power at that scale, the friction of legacy architecture becomes a multi-billion-dollar bottleneck.

On X Ben Bajarin cited a recent conference discussion by an executive from power management supplier Eaton that highlighted a massive architectural shift happening right now behind the scenes:

“800-volt DC to the rack is probably one of the biggest architectural changes that are starting to be designed into data centers, and a lot of those designs are taking place right now. You know, honestly, when look at Eaton, I think that’s one of the untold stories here, is that DC power is probably one of the biggest transformational things that are going to hit the electrical industry since, quite frankly, AC electricity was around in the Edison days.”

To understand why this is revolutionary, you have to look at how a traditional data center gets its power. Power arrives from the utility grid as medium-voltage AC. It is then stepped down to low-voltage AC, sent to the server floor, converted into DC, stepped down again, and finally fed into the server rack at 54 volts.

Every time power is converted from AC to DC, or stepped down through a transformer, there is a penalty. It generates heat, and it loses energy.

“We estimate that there’s roughly about 5% electrical loss during that transition. If you could just go from DC, directly from the utility feed, all the way through the data center into the rack, that’s 5% efficiency gain that you could get.”

In the abstract, 5% sounds like a rounding error. But scale changes everything. Eaton projects that the upcoming data center build-out to support AI will require somewhere between 50 and 100 gigawatts of power.

“So on 50 gigawatts or 100 gigawatts of power generation that’s needed, that’s 5 gigawatts of power that all of a sudden just appears from the existing infrastructure. And that is really, that is really exciting.”

Five gigawatts is not a rounding error. Five gigawatts is the equivalent output of five standard nuclear reactors. It is enough energy to power millions of homes. And in this new 800-volt DC architecture, those five gigawatts aren’t created by burning more coal, building more solar panels, or splitting more atoms.

They are created purely by the removal of friction. By subtracting the unnecessary steps.

There is a profound philosophical metaphor hidden in this electrical engineering triumph. In our own lives, and in our organizations, we are obsessed with generation. When we face a deficitโ€”a lack of time, a lack of output, a lack of revenueโ€”our default instinct is to generate more. We try to work longer hours, hire more people, or drink more coffee.

But how much of our daily energy is lost to “conversion friction”? How much mental power evaporates when we constantly context-switch between tasks, essentially converting our mental state from AC to DC and back again? How much organizational momentum is lost translating an idea through five different layers of middle management before it reaches the “rack” where the actual work is done?

Often, the most elegant and impactful solution isn’t to generate more power. It is to look at the existing architecture of your life or business, identify the transition points that are bleeding energy as heat, and rewire the system to flow directly to the source.

The invisible architecture that shapes our digital lives is shifting. In the race to build the future of artificial intelligence, the biggest breakthrough wasn’t a new way to create energy, but a century-old method of preserving it.

Categories
AI Business

The Gravity of Compute

We are currently witnessing the single largest deployment of capital in human history. The “Hyperscalers”โ€”the titans of our digital ageโ€”are pouring hundreds of billions of dollars into the ground, turning cash into concrete, copper, and silicon.

The prevailing narrative is one of unceasing, exponential growth: bigger models require bigger clusters, which require more power plants, which require more land. It relies on the assumption that the demand for centralized intelligence is insatiable and that the current architecture is the only way to feed it.

But history suggests that technology rarely moves in a straight line; it swings like a pendulum. Two forces are emerging from the periphery that could impact the ROI of this massive infrastructure build-out. One is hiding in your pocket, and the other is waiting in the sky.

A recent conversation with Gavin Baker outlines a potential “bear case” for datacenter compute demand: the rise of Edge AI.

We often assume we need the “God models”โ€”the omniscient, trillion-parameter giants hosted in the cloudโ€”for every interaction. But do we?

Baker suggests that within three years, our phones will possess the DRAM and battery density to run pruned versions of advanced models (like a Gemini 5 or Grok 4) locally. He paints a picture of a device capable of delivering 30 to 60 tokens per second at an “IQ of 115.”

“If that happens, if like 30 to 60 tokens atโ€ฆ a 115 IQ is good enough. I think that’s a bear case.” โ€” Gavin Baker

Consider the implications of that specific number. An IQ of 115 isn’t omniscient, but it is competent. It is capable, nuanced, and helpful.

If Appleโ€™s strategy succeedsโ€”making the phone the primary distributor of privacy-safe, free, local intelligenceโ€”the vast majority of our daily queries will never leave the device. We will only reach for the cloudโ€™s “God models” when we are truly stumped, much like we might consult a specialist only after our general practitioner has reached their limit. If 80% of inference happens on the edge for free, the economic model of the trillion-dollar data center begins to look fragile.

Then there is the second threat, one that attacks the terrestrial constraints of the data center itself: the Orbital Data Center. Elon Musk and SpaceX – along with Google’s Project Suncatcher – envision a future where the heavy lifting isn’t done on land, but in orbit. Space offers two things that are scarce and expensive on Earth: unlimited solar energy and an infinite heat sink for radiative cooling. If Starship can reliably loft “server racks” into orbit, the terrestrial moat of land and power grid accessโ€”currently the Hyperscalers’ greatest defensive assetโ€”evaporates.

We are left with a fascinating juxtaposition. On one hand, we have the “Edge,” pulling intelligence down from the clouds and putting it into our hands, making it personal, private, and free. On the other, we have “Orbit,” threatening to lift the remaining heavy compute off the planet entirely to bypass the energy bottleneck.

There are hundreds of billions of dollars betting on a future of heavy, centralized gravity. But if the edge gets smart enough, and the orbit gets cheap enough, the gravity may have shifted.