When Carol Coye Benson and I sat down to write Payments Systems in the U.S., one of the first problems we had to solve wasnโt about payments. It was about history.
To understand why the ACH network works the way it does, or why checks persisted decades longer than anyone expected, you need the institutional sediment underneath โ the regulatory decisions, the failed experiments, the path dependencies baked in by choices made in the 1970s that nobody thought would still matter in the 2000s. The history is the explanation. Strip it out and you have a description of current practice with no account of why it exists or what it cost to get there.
But history takes pages. And pages test a readerโs patience. So you compress. You make judgment calls about what survives the cut and what gets left behind, and you make those calls knowing that every omission is a bet โ a bet that the reader can follow without it, that the thread holds without that particular knot.
Writing it taught me something. The act of compressing, of finding the minimum sufficient version of a complex thing, forces a clarity that living inside the complexity never quite delivers. You donโt fully know what you understand until you have to say it precisely enough for someone else to follow.
But compression is always a loss. You feel it as you write. The version in the book is thinner than the thing you know.
Garry Tan uses a term โ โtokenmaxxingโ โ that initially sounds like jargon from a performance optimization thread. The idea is simple: donโt be stingy with context. Give the model everything. Every source document, every relevant article, every piece of background that a human reader would never sit still for. Let it synthesize rather than guess.
The instinct it runs against is deep. We have spent decades building information systems around compression โ search engines that retrieve rather than ingest, executive summaries that stand in for reports, one-pagers that distill months of work into something a decision-maker can absorb in four minutes. All of it was a rational response to a real constraint: human attention is finite and expensive. You couldnโt afford to read everything, so you built filters. The whole architecture of how organizations manage information was designed around that limit.
Tokenmaxxing is a bet that the limit has moved.
The model can read everything. The cost of giving it full context โ the uncompressed history, the original sources, the institutional sediment โ is low enough now that filtering before the model sees it may introduce more error than it prevents. Youโre potentially discarding signal when you summarize for the model the way youโd summarize for a human. The model doesnโt need the one-pager. It can handle the report.
This doesnโt dissolve the need for curation entirely. More context isnโt always better โ models can lose the thread in noise the same way humans do, just differently. The skill shifts from summarizing to selecting: not whatโs the minimum version of this but whatโs actually worth including. Different judgment, still essential.
But the deeper change is upstream of any particular project. The compression we built into every research process, every briefing, every book โ that was never the goal. It was the tax we paid for human cognitive limits. Part of the process doesnโt pay that tax anymore.
When I think about writing that payments book today, I donโt think the book itself would change much โ it still has human readers with finite patience. But the map we drew before writing it, the synthesis work, the โwhat connects to what across fifty years of regulatory historyโ work โ that could happen at a different depth now. The understanding you bring to the writing can be informed by everything, not just the subset you had time to read.
The payments book was written entirely for humans, with all the compression that implies. But Tyler Cowen just published what he calls a โgenerative bookโ โ 40,000 words released free online, paired on the same screen with a Claude interface so readers can discuss, interrogate, and extend it in real time. Heโs writing for both audiences simultaneously now. The human reader and the model that will help that reader go deeper. The text is optimized not just to be understood but to be used โ as context, as a jumping-off point, as raw material for a conversation that the author wonโt be in.
Thatโs a different kind of writing. Not better or worse. Different. The compression decisions change when one of your readers has no patience to protect.
Writing still clarifies thinking. That part hasnโt changed. But what youโre clarifying, and who youโre clarifying it for, is quietly expanding.
You must be logged in to post a comment.