Attention Mechanism – Scott Loftesness

In our relentless pursuit of complexity, we often overlook the elegant simplicity of a fundamental human habit: repeating ourselves.

We build colossal architectures, weave intricate neural networks, and throw mountains of computational power at our artificial intelligence systems, hoping to squeeze out a few more drops of reasoning and logic. Yet, sometimes the most profound breakthroughs require no new code, no additional latency, and no extra training data.

Sometimes, you just have to say it twice.

In a fascinating December 2025 paper titled “Prompt Repetition Improves Non-Reasoning LLMs,” researchers Yaniv Leviathan, Matan Kalman, and Yossi Matias uncovered an almost absurdly simple “free lunch” in AI optimization.

Their premise is straightforward: when you aren’t using a heavy reasoning model, simply copying and pasting your input prompt multiple times significantly boosts the model’s performance.

“When not using reasoning, repeating the input prompt improves performance for popular models (Gemini, GPT, Claude, and Deepseek) without increasing the number of generated tokens or latency.”

The mechanics behind this are elegantly pragmatic.

By repeating the prompt, you are moving the heavy computational lifting to the parallelizable “pre-fill” stage of the model’s processing. The AI’s causal attention mechanism gets to process the same tokens again, allowing the later iterations of the prompt to attend to the earlier ones. It effectively acts as a hack to simulate bidirectional attention in a decoder-only architecture.

What’s even more telling is the paper’s observation on why this works so well.

The researchers noted that models trained with Reinforcement Learning (like OpenAI’s deep-thinking variants) naturally learn to “restate the problem” in their internal monologue. They figured out on their own what these researchers are suggesting we do manually: repeat the question to focus the mind.

Reading this paper, I couldn’t help but draw a parallel to the human condition and the nature of listening.

How often do we assume that because we have articulated a thought once, it has been fully absorbed? We fire off a single, dense instruction to a colleague, a partner, or a friend, and then marvel when the nuance is lost in translation.

We suffer from our own attention bottlenecks.

Like a non-reasoning LLM trying to parse a complex query in a single pass, we are constantly bombarded with a stream of tokens—emails, notifications, conversations, fleeting thoughts. To truly understand, to truly digest and synthesize information, we need the grace of repetition.

There is a strange poetry in the fact that to make our most advanced digital minds smarter, we have to talk to them the way we talk to a distracted child or a busy spouse. The “microscope effect” highlighted in the study—where repeating a prompt drastically improved extraction tasks—shows that the failure wasn’t in the model’s capacity to know, but in its capacity to focus. Repetition forces focus. It creates a resonant echo in the context window, a digital highlighter that screams, “This matters. Look here again.”

As we continue to navigate a world increasingly augmented by artificial intelligence, this paper serves as a humbling reminder. The bleeding edge of technology isn’t always found in the most complex equation; sometimes, it’s hidden in the most basic principles of communication.

Whether you’re prompting a billion-parameter language model or trying to connect with the human sitting across from you, the lesson is clear.

Clarity isn’t just about the words you choose. It’s about giving those words the space, the resonance, and the repetition they need to be truly understood.

Say it once to be heard; say it twice to be understood.

Share this: