AI AI: Large Language Models AI: Transformers

10,000 Books

assorted books on shelf
Photo by Ivo Rainha on

Last night I was this YouTube video of an interview of Fei-Fei Li and Geoffrey Hinton in which, among many other topics, they talked about the societal impact of AI.

Hinton, in particular, made a point that I’ve not heard elsewhere about how these large language models are architecturally quite different from our human brains – the discussion begins at 53:33 into the video. I clipped that section of his remarks:

“At a later stage in my research, I had a profound realization that greatly heightened my interest in the societal impact of AI. As Fei-Fei mentioned, it’s all about the power of data.

These massive chatbots have been exposed to thousands of times more data than any human could ever hope to see.

The key reason behind this capability is the ability to create numerous copies of the same model, with each copy examining a different subset of the data. They can then derive gradients from this data to optimize their parameters. The remarkable aspect is that they can share these gradients among all the copies. This means that each copy benefits from what all the other copies have extracted from the data.

To put it into perspective, imagine if we had 10,000 individuals, each assigned to read 10,000 different books. After they’ve each read just one book, all of them would instantly know what’s in all of the books.

This is how these AI models operate, and it sets them apart as vastly superior to human capabilities.”

Geoffrey Hinton

This is a fascinating insight – and more clearly communicates the “learning power” of these LLMs than almost anything else I’ve read or heard. Think about it – brains that can share instantly what they’ve learned but simply exchanging a large quantity of gradients – the values which adjust and tune the neural networks in the models.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.