Tag: Deep Learning

The Billboard

The fog was still sitting on the hills when I put in my earbuds and headed out.

Sebastian Mallaby was talking about billboards.

Tim Ferriss had asked him the question he asks everyone: if you could put anything up there, for millions of people to see, what would it be? Mallaby has spent years inside the minds of the people who shaped modern finance — the hedge fund managers, the venture capitalists, the builders of things that changed how the world moves money. He has more material than most people accumulate in a lifetime. He could have said anything.

He said: Prepare your mind.

I kept walking. The houses were quiet in the particular way they get when school lets out for summer — no buses, no car doors, no kids at the corner. Somebody’s sprinklers were running.

The phrase comes originally from Louis Pasteur, who understood something that most people don’t: that chance is not democratic. It does not distribute itself evenly among those who wait. It finds the people who are ready. Chance favors the prepared mind. Pasteur said it, and then he proved it, and then the rest of us spent a century and a half learning it was true.

What struck me about Mallaby’s answer wasn’t the phrase itself. It was the way he said it had kept appearing in his research, surfacing in different decades and different worlds, like a message the material kept trying to send him.

He told the story of Arthur Patterson at Accel Capital. Before a new technology arrived, Accel would work through the implications — what company needs to be built, what founder fits the moment, what the right pitch looks like. So when an entrepreneur finally walked in, when the situation was live and competitive, they already knew ninety percent of what they were hearing. They could move fast because they had already moved slow.

That’s preparation as institutional practice. But Mallaby found the phrase again in a different register entirely, embedded in a single human moment that has always seemed to me like one of the hinge points of our era.

He was interviewing Ilya Sutskever, asking him why he had seen it so quickly.

In 2017, a paper called Attention Is All You Need appeared online. It described a new architecture for neural networks — the transformer — that would eventually rewrite the terms of what artificial intelligence could do. On the day the paper went up, Sutskever read it. And then he ran. He went down the corridor to find his collaborator Alex Radford and told him to stop what he was doing. Everything. Stop. We are going to build a language model on this architecture.

Not someday. Now.

Mallaby asked him how he had seen it so clearly, so fast. And Sutskever’s answer, in its essence, was the same two words: prepared mind.

He had been thinking about the problem of modeling sequential data since his PhD in Canada. For years he had been carrying a question the field hadn’t answered yet. And when the answer appeared — when the transformer showed up on a website one ordinary day — he didn’t have to reason his way toward it. He recognized it. The solution arrived and found a mind that had been waiting for it, that had already cleared space for it, that was already arranged around the shape of exactly this kind of answer.

This is what preparation actually is. Not the accumulation of facts. Not readiness in the generic sense, the vague self-improvement sense. It is the long, patient cultivation of a specific question, held close and kept alive until the answer has somewhere to land.

Mallaby chose that phrase for his billboard because it kept finding him — in the venture capital world, in the AI world, across decades and disciplines and very different kinds of genius. The prepared mind is not a personality trait. It is a practice. It is the work you do before the work arrives.

The sprinklers had clicked off by the time I turned back toward home. The fog was starting to lift off the hills. I was thinking about what I had been preparing for, whether I even knew.

Business History IBM Infrastructure Nvidia Programming Semiconductors

The Half-Life of Moats

Prompted by an article on X by @magicsilicon on the CUDA moat. Research and drafting assistance from my AI intern assistant Clark.

The NVIDIA H100 looks, in retrospect, like an inevitability. It wasn’t.

What Jensen Huang built is more accurately understood as a sixteen-year accumulation of optionality — a platform investment made in 2006 for a market that wouldn’t fully materialize until 2022. NVIDIA intros the G80 architecture in November 2006, laying the groundwork for CUDA’s release a few months later. The stated ambition was to let scientists write C++ that ran on GPU cores without needing to understand 3D graphics pipelines. The unstated bet was that parallel computation would eventually matter for something bigger than rendering shadows in video games.

For sixteen years, it mostly didn’t. Not at scale. Not commercially. CUDA lived in research labs and HPC clusters. It attracted a small, devoted, and economically marginal user base — the kind that papers cite but investors ignore. NVIDIA kept investing in it anyway: cuDNN for deep learning operations, cuBLAS for linear algebra, a layered ecosystem of libraries that made CUDA not just accessible but nearly irreplaceable for anyone doing serious numerical computation. When TensorFlow and PyTorch emerged as the standard frameworks for neural network research, they didn’t adopt CUDA because it was the only option. They adopted it because CUDA was where the optimized kernels already lived.

AlexNet won the ImageNet competition in 2012 and did it on two NVIDIA GPUs. The deep learning community noticed immediately. The financial community largely did not.

Then ChatGPT launched in November 2022, and suddenly everyone needed H100s they couldn’t get.

The parallel to Intel is instructive and also undersells how strange this kind of story looks while you’re living through it. Intel was founded in 1968 as a memory company. DRAM. The founders — Noyce, Moore, Grove — were materials scientists and engineers who believed the future was in silicon memory chips. They were right, briefly: in the early 1970s Intel dominated the DRAM market. By 1984, that share had collapsed to 1.3%, ceded almost entirely to Japanese manufacturers who had commoditized the product.

What saved Intel wasn’t a pivot so much as a realization that a stopgap had become a foundation. The 8086, conceived in 1976 as an internal hedge and launched in 1978 was never supposed to matter. It was a 16-bit processor designed to hold off Zilog while Intel finished its ambitious 32-bit iAPX 432 architecture. The 8086 was assigned to a single engineer. “If management had any inkling that this architecture would live on through many generations,” its designer Stephen Morse later recalled, “they never would have trusted this task to a single person.”

IBM chose the 8088 — a cost-reduced variant — for the original IBM PC in 1981. That decision wasn’t destiny, it was simply a procurement. And yet from that accident of selection, Intel’s x86 line became the backbone of personal computing for four decades. The Pentium in 1993 was Intel’s Wintel moment — the flag bearer the @magicsilicon tweet gestures at — but the flag had been quietly sewn since 1978.

What these histories share is not just a pattern of “slow build, explosive payoff.” The structural similarity is subtler: in both cases, the moat was a software abstraction layer built on top of hardware. Intel’s real lock-in wasn’t transistor count or clock speed. It was backward compatibility — the commitment, formalized with the 80386 in 1985, that every future Intel chip would run software written for older ones. That promise created a flywheel that trapped developers and buyers in a virtuous (for Intel) dependency loop for decades.

CUDA is the same architecture at a different layer. The lock-in isn’t the H100’s 80 gigabytes of HBM3. It’s that switching to an AMD MI300X or Google TPU means potentially rewriting training pipelines that have been optimized against CUDA kernels for years. AMD’s ROCm platform exists. It is, by most accounts, maturing. Engineers who have tried the migration report that it costs months and hundreds of thousands of dollars. The moat isn’t a wall. It’s accumulated friction — the switching cost of a decade of engineering decisions baked into codebases that no one wants to touch.

But to find the actual origin of this pattern, you have to go back further than Intel. To 1964, and to a decision IBM made that Fred Brooks — its project manager — called a bet-the-business move.

The IBM System/360 was announced on April 7, 1964, after five years of turbulent internal development. What it introduced wasn’t just a new computer. It was a new concept: the separation of architecture from implementation. Before the 360, IBM ran five incompatible product lines simultaneously. A customer who outgrew their machine had to scrap all existing software and start over. The 360 replaced all five lines with a single unified architecture — six models covering a fiftyfold performance range, all running the same operating system, all sharing the same instruction set. The name itself encoded the ambition: 360 degrees, all directions, all users.

Gene Amdahl, the 360’s chief architect, had a precise formulation for what this meant: the architecture was “an interface for which software is written, independent of any implementation.” The Principles of Operation manual described what the machine did; separate Functional Characteristics documents described how each model did it. This distinction — separating the contract from the execution — was genuinely new. It’s the conceptual root of everything that came after.

The 360 generated over $100 billion in revenue for IBM and established the first platform business model in computing. Jim Collins would later rank it alongside the Model T and the Boeing 707 as one of the three greatest business achievements of the twentieth century. But its deepest legacy was architectural: the insight that if you make your abstraction layer the standard, the hardware underneath becomes fungible. Customers didn’t buy specific IBM machines. They bought into OS/360. The machines were an implementation detail.

Intel understood this by the 1980s, even if implicitly. The 80386’s backward compatibility commitment in 1985 was IBM’s 360 insight applied to microprocessors — the architecture is the product, the silicon is the vehicle. CUDA is the same insight applied to GPU compute. What NVIDIA sold researchers in 2006 wasn’t the G80 card. It was the abstraction: write parallel code in C++, run it on any NVIDIA hardware, trust that the next generation will be faster and compatible.

The pattern is now sixty years old. It has reproduced in every major platform transition. And it keeps working for the same reason it worked in 1964: when you own the layer that developers write to, your customers’ switching costs compound every year they stay.

There’s something worth sitting with here. Neither Jensen Huang in 2006 nor Gordon Moore in 1968 could have specified exactly what the payoff would look like. What they shared was a willingness to build infrastructure for a demand they could sense but not yet see — and the discipline to keep investing in it through the long years when it looked like a research project rather than a business.

The question that doesn’t resolve cleanly is whether that kind of patience is a strategy or a personality. And whether, in an industry that now moves faster than the cycles it’s lived through, sixteen-year moats are still the kind that get built.

Which raises the uncomfortable corollary: the same AI tools that CUDA enabled may be what ultimately erodes it.

The attack on CUDA’s moat is now structurally different from anything AMD or Intel could mount before. OpenAI’s Triton compiler lets developers write GPU kernels in Python without touching CUDA at all, and generates optimized machine code that often matches hand-tuned CUDA performance. MLIR — Multi-Level Intermediate Representation, originally from Google — provides a compiler infrastructure that can target any hardware backend from a single codebase. AMD’s ROCm has historically been dismissed as immature; ROCm 7, released this year, delivers meaningfully better inference performance than its predecessors. And perhaps most directly: Claude Code reportedly ported a CUDA codebase to AMD’s ROCm in thirty minutes — work that previously took months of engineering time.

The irony is almost too neat. CUDA’s moat was built on accumulated switching costs: the friction of rewriting code, the library dependencies, the tribal knowledge encoded in a decade of kernel optimizations. AI coding tools are specifically good at exactly that kind of mechanical, high-context translation. The weapon is attacking the wall it was built behind.

That said, it’s worth being careful about the speed of this. Abstraction layers that “should” erode moats often take far longer than expected, because the moat isn’t just the code — it’s the ecosystem of tooling, documentation, community knowledge, and hardware-software co-optimization that took eighteen years to compound. Triton and MLIR are real. They’re also early. The question isn’t whether the moat is vulnerable; it’s whether it erodes before NVIDIA’s next generation of chips makes it irrelevant to argue about.

As for what comes next — which company is building the IBM 360 of this decade — the honest answer is that it’s too early to call with confidence. But there’s a candidate worth watching.

Anthropic’s Model Context Protocol, launched in late 2024, has the structural fingerprint of a platform play. MCP is a standard for how AI agents connect to external tools and data sources — a common interface layer, hardware-agnostic (or rather, model-agnostic), that any system can implement. By late 2025 it had been donated to the Linux Foundation, adopted by OpenAI and Google, and was tracking 97 million monthly SDK downloads. There are now over 10,000 MCP servers. It is becoming the way agents talk to the world.

The parallel to OS/360 is imprecise but instructive. What IBM built in 1964 was a standard interface between software and hardware that decoupled what you wrote from what you ran it on. MCP is attempting something similar one abstraction layer higher: decoupling what an agent does from the specific models, tools, and data sources it does it with. If it becomes the standard — the layer that developers write to — then whoever owns or most deeply shapes that standard controls the integration tax of an industry whose applications we can’t fully specify yet.

The counterargument is that open standards, once donated to foundations and broadly adopted, don’t generate the same lock-in as proprietary platforms. OS/360 was IBM’s. CUDA is NVIDIA’s. MCP is now the Linux Foundation’s, with OpenAI and Google as co-stewards. The historical pattern suggests the moat accrues to whoever owns the layer, not whoever invented it.

Which may mean the next great platform play is still being assembled in a room we haven’t seen yet — the way IBM’s System/360 was being architected in a Connecticut motor lodge in 1961, three years before anyone else knew what was coming.

Tags abstraction layers, AI accelerators, AI boom, AI hardware, AI Infrastructure, AI investment, AlexNet, AMD ROCm, Blackwell, chatgpt, chip architecture, computing platforms, CUDA, cuDNN, Deep Learning, developer ecosystems, emerging technology, foundation models, Fred Brooks, generative ai, Gordon Moore, GPU computing, H100, hardware abstraction, Hopper architecture, IBM mainframe, IBM System/360, Inference, Intel, jensen huang, Linux Foundation, MCP, MLIR, Model Context Protocol, nvidia, open standards, OpenAI Triton, parallel computing, PC revolution, physical AI, platform strategy, PyTorch, Scott Loftesness, semiconductor history, software ecosystems, switching costs, Tech History, tech strategy, technology moats, TensorFlow, Venture Capital, x86

AI History

The Arrival

Yoshua Bengio spent forty years building the foundation of modern artificial intelligence. He won the Turing Award for it. And he didn’t think he’d live to see it work.

That’s the quiet fact buried inside Stephen Witt’s New Yorker profile of him. Bengio — one of the three researchers whose decades-long bet on neural networks eventually became the architecture underlying every large language model running today — had made peace with the idea that the thing he was building was a multi-generational project. Something for his successors to finish. Then Witt writes: “one day in late 2022, the technology had simply arrived. He compared it to meeting an extraterrestrial.”

Hemingway once described bankruptcy happening two ways: gradually, then suddenly. He meant ruin. Bengio experienced something harder to name — not ruin but arrival, which carries its own vertigo. The gradually was four decades of work that most of his peers considered quixotic. The suddenly was a Tuesday in November when a chat interface went live and the world quietly changed.

What unsettles me about the extraterrestrial comparison isn’t the strangeness it implies. It’s the distance. You meet an alien; you don’t meet something you made. The metaphor suggests that even its creator couldn’t fully recognize it — that the thing, once arrived, belonged to a category that exceeded its own origins.

We don’t have good language for this. Breakthrough, inflection point, paradigm shift — these are words people reach for after the fact, when they’re building timelines. What Bengio seems to be describing is the experience of standing in front of a threshold you spent your life approaching, and finding it already behind you.

The technology didn’t ask permission. It didn’t announce itself.

It arrived.

Tags artificial intelligence, chatgpt, Deep Learning, history of technology, innovation, Scott Loftesness, stephen witt, technology, writing, Yoshua Bengio

AI AI: Large Language Models

The Echo Effect: Why Prompt Repetition is AI’s Best Kept Secret

Post author By Scott Loftesness
Post date February 19, 2026
No Comments on The Echo Effect: Why Prompt Repetition is AI’s Best Kept Secret

In our relentless pursuit of complexity, we often overlook the elegant simplicity of a fundamental human habit: repeating ourselves.

We build colossal architectures, weave intricate neural networks, and throw mountains of computational power at our artificial intelligence systems, hoping to squeeze out a few more drops of reasoning and logic. Yet, sometimes the most profound breakthroughs require no new code, no additional latency, and no extra training data.

Sometimes, you just have to say it twice.

In a fascinating December 2025 paper titled “Prompt Repetition Improves Non-Reasoning LLMs,” researchers Yaniv Leviathan, Matan Kalman, and Yossi Matias uncovered an almost absurdly simple “free lunch” in AI optimization.

Their premise is straightforward: when you aren’t using a heavy reasoning model, simply copying and pasting your input prompt multiple times significantly boosts the model’s performance.

“When not using reasoning, repeating the input prompt improves performance for popular models (Gemini, GPT, Claude, and Deepseek) without increasing the number of generated tokens or latency.”

The mechanics behind this are elegantly pragmatic.

By repeating the prompt, you are moving the heavy computational lifting to the parallelizable “pre-fill” stage of the model’s processing. The AI’s causal attention mechanism gets to process the same tokens again, allowing the later iterations of the prompt to attend to the earlier ones. It effectively acts as a hack to simulate bidirectional attention in a decoder-only architecture.

What’s even more telling is the paper’s observation on why this works so well.

The researchers noted that models trained with Reinforcement Learning (like OpenAI’s deep-thinking variants) naturally learn to “restate the problem” in their internal monologue. They figured out on their own what these researchers are suggesting we do manually: repeat the question to focus the mind.

Reading this paper, I couldn’t help but draw a parallel to the human condition and the nature of listening.

How often do we assume that because we have articulated a thought once, it has been fully absorbed? We fire off a single, dense instruction to a colleague, a partner, or a friend, and then marvel when the nuance is lost in translation.

We suffer from our own attention bottlenecks.

Like a non-reasoning LLM trying to parse a complex query in a single pass, we are constantly bombarded with a stream of tokens—emails, notifications, conversations, fleeting thoughts. To truly understand, to truly digest and synthesize information, we need the grace of repetition.

There is a strange poetry in the fact that to make our most advanced digital minds smarter, we have to talk to them the way we talk to a distracted child or a busy spouse. The “microscope effect” highlighted in the study—where repeating a prompt drastically improved extraction tasks—shows that the failure wasn’t in the model’s capacity to know, but in its capacity to focus. Repetition forces focus. It creates a resonant echo in the context window, a digital highlighter that screams, “This matters. Look here again.”

As we continue to navigate a world increasingly augmented by artificial intelligence, this paper serves as a humbling reminder. The bleeding edge of technology isn’t always found in the most complex equation; sometimes, it’s hidden in the most basic principles of communication.

Whether you’re prompting a billion-parameter language model or trying to connect with the human sitting across from you, the lesson is clear.

Clarity isn’t just about the words you choose. It’s about giving those words the space, the resonance, and the repetition they need to be truly understood.

Say it once to be heard; say it twice to be understood.

The New Newton

“Machine learning is a very important branch of the theory of computation… it has enormous power to do certain things, and we don’t understand why or how.”
— Avi Wigderson, Herbert H. Maass Professor, School of Mathematics.

There is a specific kind of silence that permeates the woods surrounding the Institute for Advanced Study (IAS) in Princeton. It is a silence designed for “blue-sky” thinking, the kind that allowed Einstein to ponder relativity and Gödel to break logic. For decades, this has been the sanctuary of the slow, deliberate grind of human intellect—chalk dust on slate, long walks, and the solitary pursuit of elegant proofs.

But recently, the tempo in those woods has changed.

We are witnessing a profound shift in the architecture of discovery. In closed-door meetings and public workshops, the conversation among the world’s top theorists is moving from skepticism to a startled accelerationism. The consensus emerging is that Artificial Intelligence is no longer merely a peripheral calculator; it is becoming an “autonomous researcher.”

The 90% Shift

Some physicists now suggest that AI can handle up to 90% of the routine analytical and coding “heavy lifting” of science. This is a staggering metric. It frees the human mind from the drudgery of calculation, but it also introduces a tension that strikes at the heart of the scientific method. We are moving into a realm where the tool may soon outpace the master’s understanding.

There is a growing realization that we are approaching a horizon where AI finds solutions—patterns in the noise of the universe—that work perfectly but remain mathematically “magic.” We might cure a disease or solve a fusion equation without understanding the why behind the how.

A New Natural Phenomenon

This brings us to a fascinating historical rhyme. Scholar Sanjeev Arora has compared our current moment in AI to physics in the era of Isaac Newton. When Newton watched the apple fall, he could describe the gravity, but he couldn’t explain the fundamental mechanism of why it existed.

Today, scholars at the IAS are looking at deep learning in the same way. They are observing a new natural phenomenon—a digital physics. They are trying to find the “laws” of deep learning, asking why these massive models work when classical statistics suggests they should fail (such as in cases of overfitting).

We are building a new machine, and now we must retroactively discover the physics that governs it.

Steering the Black Box

This is not just a mathematical challenge; it is a societal one. The IAS has wisely expanded this inquiry to the School of Social Science. If we are handing over the keys of discovery to a “black box,” we must ensure we are steering it “for the Public Good.” The distinction between genuine problem-solving—like protein folding—and “AI Snake Oil” in social prediction is vital. We cannot let the magic of the tool blind us to the morality of its application.

The future of science, it seems, will not just be about the genius on the chalkboard. It will be about the partnership between the human question and the digital answer. The challenge for the modern scholar is no longer just to calculate, but to comprehend the alien intelligence we have invited into the library.

Share this:

Share this:

Share this:

Share this:

The 90% Shift

A New Natural Phenomenon

Steering the Black Box

Share this: