Category: AI: Transformers

The State You Never See

The transaction arrives in milliseconds. A purchase attempt — a gas station in Phoenix, a grocery store in suburban Atlanta, a wire transfer at 2 a.m. — and somewhere in the authorization chain, a system has to decide. Not later. Now. The clock is already running.

When I led the fraud detection team at Visa, this was the problem that lived in your chest. You couldn’t see what you needed to see. You couldn’t know whether the person presenting that card was the person who owned it, whether the account had been compromised six hours ago in a breach you hadn’t yet detected, whether the behavioral signature of these transactions was the legitimate cardholder running errands or a fraudster working methodically through a stolen number before the window closed. You could only see what the transactions said. You could never see the state underneath.

That distinction — between what you can observe and what is actually true — turns out to be one of the organizing problems of our time. It has a name, a formal structure, and a history that runs from mid-century mathematics through the trading floors of quantitative hedge funds to the frontier of artificial intelligence. The name is the hidden Markov model. But the problem it addresses is older than the math, and more human than the jargon suggests.

AI AI: Transformers Books

The Updating Machine

Tom Chivers puts Bayes’ theorem in plain English and it sounds almost obvious: “the probability of event A, given event B, equals the probability of B given A, times the probability of A on its own, divided by the probability of B on its own.” A formula for revising what you believe when new evidence arrives. You started somewhere. Something changed. Now you believe something slightly different. Repeat.

The obvious part is the mechanics. The hard part is the loop.

Most reasoning errors I catch in myself aren’t failures of logic — they’re failures to update. I hold a view, evidence accumulates against it, and I find reasons the evidence is flawed rather than reasons the view might be.

Psychologists have a name for this: confirmation bias. But I’ve always found that label a bit too clean, like it describes a bug rather than a feature.

The prior isn’t wrong to be sticky. It represents everything you’ve learned up to this point. The problem is when it becomes load-bearing — when the prior stops being a starting position and starts being a conclusion.

“Strong opinions, loosely held” is supposed to solve this. It’s a useful phrase — it captures something true about the right posture toward your own beliefs. But in practice the second half is harder to honor than it sounds. The strong opinion gets stated, new evidence arrives, and changing your mind in public feels like losing. The “loosely held” part quietly becomes decorative.

What Bayes actually demands is something closer to epistemic humility with arithmetic attached. You don’t get to say I don’t know. You have to say I estimate 0.4, and here is what would move me to 0.6. That’s harder. It requires you to specify not just what you believe but how you’d know if you were wrong.

This is why Bayesian thinking keeps surfacing in AI conversations. Modern language models do something structurally adjacent to this — not consciously, but mechanically. Every token generated is a probability distribution revised forward by context. The model doesn’t know the next word; it updates a prior over all possible words, given everything that came before. It’s not reasoning the way humans reason, but it’s updating the way Bayes updates: continuously, contextually, without the luxury of certainty.

Whether that’s comforting or unsettling probably depends on your own prior.

The deeper thing Chivers is pointing at, I think, is that Bayesian reasoning is essentially a description of intellectual honesty as a process rather than a trait. You can’t just decide to be open-minded. You have to build the loop: form a belief, assign it a probability, watch for evidence that should move it, and then actually move it. Most of us do the first three. The fourth step is where it gets expensive.

I’ve been wrong about enough things by now that I’ve started to treat my own confident views with mild suspicion. Not paralysis — you have to act on something — but a background awareness that the prior I’m acting on was formed by a person who had less information than I do now, and less than I’ll have next year.

Strong opinions, loosely held, sounds right. The trick is meaning it.

Tags ai, bayes, belief updating, books, cognitive bias, decision making, epistemology, intellectual honesty, machine learning, Probability, reasoning, Scott Loftesness, Tom Chivers

AI AI: Large Language Models AI: Transformers

10,000 Books

assorted books on shelf — Photo by Ivo Rainha on Pexels.com

Last night I was this YouTube video of an interview of Fei-Fei Li and Geoffrey Hinton in which, among many other topics, they talked about the societal impact of AI.

Hinton, in particular, made a point that I’ve not heard elsewhere about how these large language models are architecturally quite different from our human brains – the discussion begins at 53:33 into the video. I clipped that section of his remarks:

“At a later stage in my research, I had a profound realization that greatly heightened my interest in the societal impact of AI. As Fei-Fei mentioned, it’s all about the power of data.

These massive chatbots have been exposed to thousands of times more data than any human could ever hope to see.

The key reason behind this capability is the ability to create numerous copies of the same model, with each copy examining a different subset of the data. They can then derive gradients from this data to optimize their parameters. The remarkable aspect is that they can share these gradients among all the copies. This means that each copy benefits from what all the other copies have extracted from the data.

To put it into perspective, imagine if we had 10,000 individuals, each assigned to read 10,000 different books. After they’ve each read just one book, all of them would instantly know what’s in all of the books.

This is how these AI models operate, and it sets them apart as vastly superior to human capabilities.”
Geoffrey Hinton

This is a fascinating insight – and more clearly communicates the “learning power” of these LLMs than almost anything else I’ve read or heard. Think about it – brains that can share instantly what they’ve learned but simply exchanging a large quantity of gradients – the values which adjust and tune the neural networks in the models.

Tags Scott Loftesness

AI AI: Diffusion Models AI: Large Language Models AI: Transformers Futures Living

Navigating the Infinite

We will soon, if not already, be drowning in the Sea of Infinite Content!

It’s become clear that we’re heading into a world of infinite content – as if we aren’t already drowning in that sea of meaningless, automatically generated content. What was once a seemingly manageable stream of books, websites, and media is becoming a overwhelming tidal wave, threatening to erode the shores of human creativity. The age of innovation is impacted.

What’s moving us from today’s world of “just a lot” to our future of “way too much”? Why do I say we’re drowning in a sea of infinite content?

In two words: generative AI.

Since the launch last fall of ChatGPT (and many similar tools), it’s become increasing clear that we can now use these tools to churn out endless repetitive, low quality content. Indeed they can create spammy nonsense for themselves, with no regard for truth or diversity. All that matters is predicting the best next word.

The focus is on quantity over quality. So much garbage is being produced that it’s becoming harder to find meaningful information and creative art amidst the noise. Useful voices are being drowned out by the drone of algorithmic imitation of what’s already popular.

There is also the risk of misinformation as fake AI-generated content spreads. Propaganda and radicalization loom as nefarious actors exploit these tools. Jobs in creative fields disappear as AI replaces human creators and artists.

All this tidal wave of endless content needs is electricity. Power. And ever more semiconductors.

Where does this path lead us? What will become of creativity and originality if AI takes over? We must question how to harness infinite content ethically to serve humanity, not overwhelm it. The age of human innovation cannot be allowed to end under a sea of meaningless artificial content. We cannot lose what makes us human.

How can we ensure these technologies are used responsibly? How can we stem the tide before it’s too late? The debates must begin now.

And where will all of that power – and all of those semiconductors – come from?

Tags generative ai, implications of generative ai, infinite content

AI AI: Large Language Models AI: Transformers ChatGPT

The Best Explainer of ChatGPT

Post author By Scott Loftesness
Post date February 19, 2023
No Comments on The Best Explainer of ChatGPT

Just came across a great explainer of ChatGPT and the underlying technologies by Stephen Wolfram: What Is ChatGPT Doing … and Why Does It Work?

That ChatGPT can automatically generate something that reads even superficially like human-written text is remarkable, and unexpected. But how does it do it? And why does it work? My purpose here is to give a rough outline of what’s going on inside ChatGPT—and then to explore why it is that it can do so well in producing what we might consider to be meaningful text.

Highly recommended!

AI AI: Large Language Models AI: Transformers ChatGPT

Hallucinating

woman in white knitted sweater — Photo by cottonbro studio on Pexels.com

It’s been just over a last week since Microsoft made such a big deal about an enhancement to its Bing search engine that adds OpenAI’s GPT chat capability to it. In the process, Microsoft declared how it expected this new capability to help drive significant market share growth in search.

But it’s become clear in the last week that this kind of chat capability is much less about enhancing search (particularly if you want accurate answers) and much more about generating creative text useful for other purposes. This capability has become known as hallucination – where a chatbot just started string text together. Cade Metz writes: “hallucinate is just a catchy term for “they make stuff up.”

I’ve had some great fun playing around with this to get help writing, for example, short stories. I’ll provide a few sentences to seed the chatbot’s “thinking” and then ask it to complete a 1,000 word short story based on that input I provided. It’s been fun to see what results.

Even more fun has been asking the chatbot to adjust the style to make the writing similar to other famous authors such as Hemingway, Steinbeck, Twain, George Saunders and others. It’s been fun to see the stylistic changes it makes to the same basic story based on the writer’s style that I specify.

None of this work has anything to do with search – nor does it help in any way by enhancing search results. It’s something completely different, strikingly interesting, and a heck of a lot of fun to play with. Whether it’ll be really useful in helping me do any real writing remains an open question – but meanwhile I’m enjoying sparring with a seemingly smart creative “mind” on the other end of my computer screen!

AI AI: Large Language Models AI: Transformers ChatGPT

Attention is all you need – or is it?

Post author By Scott Loftesness
Post date February 11, 2023
No Comments on Attention is all you need – or is it?

woman in white long sleeve shirt standing near white and gray house during daytime — Photo by Julian Jagtenberg on Pexels.com

How important is accuracy? Sort of feels like the pursuit of quality in Zen and the Art of Motorcycle Maintenance!

I’ve been enjoying following the evolution of AI technology which seems to be accelerating at an ever increasing rate. Speaking with a good friend earlier this week, he said “Scott, it really feels to me like the early 90’s – when change was accelerating (the Internet) and we could feel it but didn’t really know what to make of it.” Indeed, it does feel like that again.

After spending some time both playing with ChatGPT, Poe, and others, I’ve come to respect what they’re capable of. But I’ve also come to learn more about what they’re not capable of – namely, dealing with facts in an accurate way. These tools all provide a disclaimer that they may generate inaccurate results – and that their results must be checked for accuracy. And for good reason. Once you understand how the large language models work, you can understand why.

In my simple understanding, what these LLMs do is get trained on very large corpuses of textual data – like the “whole Internet” – and that training is then “validated” by humans who test it with lots of queries and inspection of the generated results. That combination of training and verification is used to essentially set the weights inside the model which then are used in a kind of simplistic way to generate text – by moving from word to word (or word fragment to word fragment) and “writing” answers to queries. In other words, they’re using their training to come up with the best possible next word to output given the query they’ve been given.

Clearly, the best possible next word isn’t necessarily an accurate one. Rather it’s one that the model has seen most frequently. Because of this fundamental characteristic in how these things work, you can get results that aren’t necessarily accurate. As a personal example, it asked ChatGPT to give me a history of a small town where my father grew up. I knew that history – including where the name of the town came from. ChatGPT gave me the wrong attribution for the name of the town. But when I asked a bit differently whether the name actually came from the location I knew to be correct the model came back and agreed with me.

In my mind, coming to better understand these characteristics has helped me understand where these kinds of models may be very useful and other situations where they might be less useful. For example, asking one of these LLMs to help write a short story is a very good use. See an earlier example I wrote about using ChatGPT to write haiku poetry. That use case isn’t one that depends on any accuracy – it’s just one where clever use of text is all you want. Similarly, I can see how using LLMs to help write computer code (like GitHub Copilot) makes a lot of sense. But I question how much value an LLM can bring to helping deliver better search results – which is the current basis for what Microsoft is trying to do with its new Bing and enhancements to the Edge browser. It’s less clear that I want potentially inaccurate results from an LLM being delivered to me in response to search.

It’s early days to be thinking about all of this – and I’ve got a lot more to learn. Meanwhile, I’m really having some fun using these tools to help me write short stories – based on a start that I provide and then asking for the result to mimic the writing style of a Hemingway, Steinbeck or Didion. That kind of creative use definitely is helped using these tools. Getting accuracy in search results seems much less promising.

AI AI: Large Language Models AI: Transformers

Large Language Models and Transformers

Post author By Scott Loftesness
Post date February 4, 2023
No Comments on Large Language Models and Transformers

This morning I noticed an article titled “Understanding AI” in the Financial Times about an upcoming two-week series on artificial intelligence. That prompted me to just ask ChatGPT to write a 1,000 word blog post describing large language models (LLM) and transformers. Below is what I got back from ChatGPT…a pretty useful description of both! By the way, today’s Wall St. Journal has an article titled “Sports Illustrated Publisher Uses AI to Write Articles” that discusses how some publications are beginning to use AI chat bots to generate stories for publication.

Also earlier this week I watched a YouTube video from Stanford with Professor Christopher Potts titled GPT-3 and Beyond – a bit technical for me but he provided a fascinating introduction to the progress in AI.

Large Language Models and Transformers

Large language models and transformers are artificial intelligence technologies that have been revolutionizing the field of natural language processing. These models are capable of performing complex language tasks such as text classification, question answering, and machine translation. In this post, we will provide a comprehensive overview of large language models and transformers, their history, how they work, and their current and future applications.

A brief history of language models

The study of language models has a long history dating back to the 1950s when the first attempts to build machine translation systems were made. At that time, the focus was on building rule-based systems that relied on pre-defined grammar and syntax rules to translate between languages. However, these systems were limited in their ability to handle ambiguity and variability in natural language.

In the late 1980s, the field of machine learning emerged, and language models started to use statistical techniques to improve their performance. Over time, researchers have developed more sophisticated models that can learn from large amounts of data and achieve better performance on various language tasks.

Introduction to large language models

Large language models are deep neural networks that are trained on massive amounts of text data to perform a wide range of natural language processing tasks. The key innovation behind these models is the ability to process and understand the context of words and phrases in a sentence. This allows them to generate more accurate predictions and perform more complex language tasks than traditional rule-based systems.

Large language models are trained on billions of words from a vast variety of text sources, including books, news articles, websites, and social media. This enables them to learn the patterns and relationships between words and sentences, as well as the meaning of words and phrases in different contexts.

Introduction to transformers

Transformers are a type of large language model that have been specifically designed for natural language processing. They were introduced in 2017 by Vaswani et al. in a paper called “Attention is All You Need.” The key innovation behind transformers is the attention mechanism, which allows the model to focus on specific parts of the input sequence when making predictions.

Traditional language models process sequential data by breaking it down into smaller pieces and processing each piece in a linear fashion. This makes it difficult for the model to consider the relationships between words that are far apart in the input sequence. Transformers, on the other hand, use attention mechanisms to allow the model to weigh the importance of different parts of the input sequence when making predictions.

How transformers work

Transformers work by first encoding the input sequence into a set of continuous vectors. These vectors are then processed by a number of self-attention layers, which use the attention mechanism to focus on specific parts of the input sequence when making predictions.

The attention mechanism in transformers works by calculating the relationships between all pairs of words in the input sequence. The model then uses these relationships to weigh the importance of different parts of the input sequence when making predictions. This allows the model to consider the context of words and phrases in the input sequence, leading to more accurate predictions.

After the self-attention layers, the encoded vectors are processed by a feed-forward neural network to make the final predictions. The feed-forward network includes a number of dense layers, which are fully connected layers that allow the model to make more complex predictions.

Applications of large language models and transformers

Large language models and transformers have a wide range of applications in the field of natural language processing. Some of the most notable applications include:

Text classification: Large language models can be used to classify text into different categories, such as sentiment analysis, topic classification, and spam detection.
Question answering: Large language models can be used to answer questions by extracting relevant information from a text corpus and providing a concise answer.
Machine translation: Transformers have been shown to be highly effective for machine translation, as they are able to handle the complexity of translating between languages and capture the relationships between words and phrases in different languages.
Summarization: Large language models can be used to summarize long documents into shorter versions that retain the main information.
Dialogue systems: Large language models can be used to build dialogue systems that can understand and respond to human language in a natural way.
Generative models: Large language models can also be used for generative tasks, such as writing creative fiction or composing music.

Current and future developments

In recent years, there has been rapid progress in the development of large language models and transformers. This is due in part to the availability of massive amounts of text data and the advancement of computational resources, which have allowed researchers to train models that are larger and more complex than ever before.

One of the current challenges in the field is reducing the computational cost of training and using large language models. This is important because large language models require vast amounts of computational resources, and their deployment is limited by the availability of these resources.

There are also ongoing efforts to improve the ethical and social implications of large language models and transformers. For example, there is concern about the potential for these models to reinforce biases and perpetuate harmful stereotypes. Researchers are working to develop methods for mitigating these risks and ensuring that large language models are used in a responsible and ethical manner.

Conclusion

Large language models and transformers have been making a significant impact on the field of natural language processing. Their ability to process and understand the context of words and phrases in a sentence has led to advances in a wide range of applications, including text classification, question answering, and machine translation. As these models continue to evolve, we can expect to see even more exciting developments in the future.

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Large Language Models and Transformers

Share this: