Categories
AI Learning Photography

Autopilot

“Superb photographs are not just taken with cameras. They come from within you, your eyes, your mind, your heart, not ice cold equipment.” Fan Ho

There’s a half-second on the street, somewhere between seeing a frame and shooting it, that used to take me whole minutes. Early on, with a camera in my hands on the streets of San Francisco or on the subway platforms in New York, I’d see something — light falling a certain way, a gesture about to resolve into a gesture — and I’d think my way through it. Assess the composition or the angle. Worry about the background. By the time I’d worked it out, the moment might be gone, replaced by some lesser version of itself.

That doesn’t happen to me anymore, and I couldn’t tell you when it stopped. Somewhere along the way the thinking disappeared and the shooting stayed. I see the frame and the shutter goes, and only afterward, looking at the file, do I understand what I saw. I didn’t explicitly decide to skip the thinking. It just stopped showing up, the way a habit eventually stops asking your permission. Or how driving a car becomes second nature.

I think about this because of a problem the AI labs have been calling continual learning. The AI models we use are like brilliant interns. They can solve a hard problem at nine in the morning and a harder one by five, and they’ll astonish you doing it. But every session starts over from zero. Whatever they got right on Tuesday evaporates by Wednesday, the way a dream is gone by the time you’ve found your slippers.

The industry’s first answer was to give them a longer memory — let the window hold the whole case file in front of them, all the time. This works for a while, the same way it would work for me on the street if I stopped and re-derived the exposure math for every frame. But that isn’t how I shoot anymore. I don’t have the math open. I have what’s left after thousands of frames did the math for me and then got out of the way.

Based on some exploration I did this morning using AI I found three different AI research efforts that are now chasing that gap, from different angles, none of them all the way there.

A team out of Stanford and NVIDIA built something called TTT-E2E, which lets a model keep adjusting its own internal weights while it reads — not just holding the page in front of it, but being changed by the page, a little, as it goes. It runs thirty-five times faster than the brute-force method of remembering everything, because it isn’t remembering everything.

Google’s research arm published something called Nested Learning around the same time, built on the idea that a mind isn’t one system learning at one speed, but several systems nested inside each other — some updating by the minute, some by the year.

And a scrappier strand of work called self-distillation has models teaching cheaper versions of themselves, not by handing over a transcript, but by training the cheaper model to arrive on its own at whatever the well-informed version would have concluded.

None of this is what happens when I make a photo. Not yet. But it’s aimed at the same gap I live in every time I shoot before I understand what I’m shooting. The gap between having the math and having the eye.

I once asked Doug, a good friend who’s spent as many days on the street as I have, how he knew when to press the shutter. He didn’t have an answer, not really — just a shrug, and something about the moment feeling complete before he could explain why. That shrug took him years to earn. He didn’t keep the years. He kept the shrug.

And then a few years ago Doug did something I still don’t fully understand. He abandoned digital and went back to film. Not for any project, not for the look of it — he could get that in post if he wanted it. He went back to the actual mechanics: loading a roll, metering by hand, often using a tripod, etc. I needled him about it some, the way you’d needle a cigarette smoker who’d taken up a pipe instead, as if the inconvenience were the point. He told me he wanted to slow down, and that film was the only thing that reliably made him do it. Twelve frames and then you stop and reload and you can’t fix it later. The very friction he’d spent decades shooting his way out of, he went looking for again, on purpose.

I don’t know what to do with that, except to notice that he’s the same man who can give me the shrug and also the man who walked back toward the thing the shrug had replaced. Maybe that’s the part the labs haven’t gotten to yet, underneath all the vocabulary of weight updates and meta-learned initializations. Compression is the whole point, until the day it isn’t.

Note: This line of thinking started with a recent essay by Dwarkesh Patel on what he calls continual learning. It’s become a real focus of his thinking about how we get to a better future with AI.

See: https://www.dwarkesh.com/p/the-next-paradigm