AI AI: Large Language Models AI: Transformers ChatGPT

The Best Explainer of ChatGPT

Just came across a great explainer of ChatGPT and the underlying technologies by Stephen Wolfram: What Is ChatGPT Doing … and Why Does It Work?

That ChatGPT can automatically generate something that reads even superficially like human-written text is remarkable, and unexpected. But how does it do it? And why does it work? My purpose here is to give a rough outline of what’s going on inside ChatGPT—and then to explore why it is that it can do so well in producing what we might consider to be meaningful text.

Highly recommended!

AI AI: Large Language Models AI: Transformers ChatGPT


woman in white knitted sweater
Photo by cottonbro studio on

It’s been just over a last week since Microsoft made such a big deal about an enhancement to its Bing search engine that adds OpenAI’s GPT chat capability to it. In the process, Microsoft declared how it expected this new capability to help drive significant market share growth in search.

But it’s become clear in the last week that this kind of chat capability is much less about enhancing search (particularly if you want accurate answers) and much more about generating creative text useful for other purposes. This capability has become known as hallucination – where a chatbot just started string text together. Cade Metz writes: “hallucinate is just a catchy term for “they make stuff up.”

I’ve had some great fun playing around with this to get help writing, for example, short stories. I’ll provide a few sentences to seed the chatbot’s “thinking” and then ask it to complete a 1,000 word short story based on that input I provided. It’s been fun to see what results.

Even more fun has been asking the chatbot to adjust the style to make the writing similar to other famous authors such as Hemingway, Steinbeck, Twain, George Saunders and others. It’s been fun to see the stylistic changes it makes to the same basic story based on the writer’s style that I specify.

None of this work has anything to do with search – nor does it help in any way by enhancing search results. It’s something completely different, strikingly interesting, and a heck of a lot of fun to play with. Whether it’ll be really useful in helping me do any real writing remains an open question – but meanwhile I’m enjoying sparring with a seemingly smart creative “mind” on the other end of my computer screen!

AI AI: Large Language Models AI: Transformers ChatGPT

Attention is all you need – or is it?

woman in white long sleeve shirt standing near white and gray house during daytime
Photo by Julian Jagtenberg on

How important is accuracy? Sort of feels like the pursuit of quality in Zen and the Art of Motorcycle Maintenance!

I’ve been enjoying following the evolution of AI technology which seems to be accelerating at an ever increasing rate. Speaking with a good friend earlier this week, he said “Scott, it really feels to me like the early 90’s – when change was accelerating (the Internet) and we could feel it but didn’t really know what to make of it.” Indeed, it does feel like that again.

After spending some time both playing with ChatGPT, Poe, and others, I’ve come to respect what they’re capable of. But I’ve also come to learn more about what they’re not capable of – namely, dealing with facts in an accurate way. These tools all provide a disclaimer that they may generate inaccurate results – and that their results must be checked for accuracy. And for good reason. Once you understand how the large language models work, you can understand why.

In my simple understanding, what these LLMs do is get trained on very large corpuses of textual data – like the “whole Internet” – and that training is then “validated” by humans who test it with lots of queries and inspection of the generated results. That combination of training and verification is used to essentially set the weights inside the model which then are used in a kind of simplistic way to generate text – by moving from word to word (or word fragment to word fragment) and “writing” answers to queries. In other words, they’re using their training to come up with the best possible next word to output given the query they’ve been given.

Clearly, the best possible next word isn’t necessarily an accurate one. Rather it’s one that the model has seen most frequently. Because of this fundamental characteristic in how these things work, you can get results that aren’t necessarily accurate. As a personal example, it asked ChatGPT to give me a history of a small town where my father grew up. I knew that history – including where the name of the town came from. ChatGPT gave me the wrong attribution for the name of the town. But when I asked a bit differently whether the name actually came from the location I knew to be correct the model came back and agreed with me.

In my mind, coming to better understand these characteristics has helped me understand where these kinds of models may be very useful and other situations where they might be less useful. For example, asking one of these LLMs to help write a short story is a very good use. See an earlier example I wrote about using ChatGPT to write haiku poetry. That use case isn’t one that depends on any accuracy – it’s just one where clever use of text is all you want. Similarly, I can see how using LLMs to help write computer code (like GitHub Copilot) makes a lot of sense. But I question how much value an LLM can bring to helping deliver better search results – which is the current basis for what Microsoft is trying to do with its new Bing and enhancements to the Edge browser. It’s less clear that I want potentially inaccurate results from an LLM being delivered to me in response to search.

It’s early days to be thinking about all of this – and I’ve got a lot more to learn. Meanwhile, I’m really having some fun using these tools to help me write short stories – based on a start that I provide and then asking for the result to mimic the writing style of a Hemingway, Steinbeck or Didion. That kind of creative use definitely is helped using these tools. Getting accuracy in search results seems much less promising.


AI Chatbots are the new attention getters

The wave of AI chatbots continues to build – with the recent public release of ChatAPT, the new Poe app from Quora, today’s Bing announcement from Microsoft and whatever might be coming from Google, Facebook, Apple and others.

I’ve been doing quite a bit of experimenting with ChatAPT in particular – as you can see from some of my recent posts. Today I’ve been spending time with Poe on my iPhone – it’s a really well done app that makes it even easier to interact conversationally with AI chatbots (Poe supports multiple chatbots).

One thing is clear from my play – these new “tools” are true “attention sinks.” As you ask questions and research a topic, the ability to interact and dive deeper is quite addicting – and time consuming! It’s all an interesting experience.

With the potential for significant monetization for companies that can captivate our attention through their apps, the time we spend with AI chatbots may come at the expense of time spent on other apps, which could impact the ability those other apps to continue to monetize at the same level. This shift will be intriguing to observe.

AI AI: Diffusion Models ChatGPT

Diffusion Models and Image Creation

I’ve been doing some more exploring with ChatGPT – this time asking it to educate me on how diffusion models work – they’re the underlying technology behind apps like DALL-E, Stable Diffusion, Midjourney, etc. Below is what I learned during my conversation with ChatGPT. The image below was created using DiffusionBee, an app based on Stable Diffusion. I used the input: a colorful illustration of how diffusion models work.

AI AI: Large Language Models AI: Transformers

Large Language Models and Transformers

This morning I noticed an article titled “Understanding AI” in the Financial Times about an upcoming two-week series on artificial intelligence. That prompted me to just ask ChatGPT to write a 1,000 word blog post describing large language models (LLM) and transformers. Below is what I got back from ChatGPT…a pretty useful description of both! By the way, today’s Wall St. Journal has an article titled “Sports Illustrated Publisher Uses AI to Write Articles” that discusses how some publications are beginning to use AI chat bots to generate stories for publication.

Also earlier this week I watched a YouTube video from Stanford with Professor Christopher Potts titled GPT-3 and Beyond – a bit technical for me but he provided a fascinating introduction to the progress in AI.

Large Language Models and Transformers

Large language models and transformers are artificial intelligence technologies that have been revolutionizing the field of natural language processing. These models are capable of performing complex language tasks such as text classification, question answering, and machine translation. In this post, we will provide a comprehensive overview of large language models and transformers, their history, how they work, and their current and future applications.

A brief history of language models

The study of language models has a long history dating back to the 1950s when the first attempts to build machine translation systems were made. At that time, the focus was on building rule-based systems that relied on pre-defined grammar and syntax rules to translate between languages. However, these systems were limited in their ability to handle ambiguity and variability in natural language.

In the late 1980s, the field of machine learning emerged, and language models started to use statistical techniques to improve their performance. Over time, researchers have developed more sophisticated models that can learn from large amounts of data and achieve better performance on various language tasks.

Introduction to large language models

Large language models are deep neural networks that are trained on massive amounts of text data to perform a wide range of natural language processing tasks. The key innovation behind these models is the ability to process and understand the context of words and phrases in a sentence. This allows them to generate more accurate predictions and perform more complex language tasks than traditional rule-based systems.

Large language models are trained on billions of words from a vast variety of text sources, including books, news articles, websites, and social media. This enables them to learn the patterns and relationships between words and sentences, as well as the meaning of words and phrases in different contexts.

Introduction to transformers

Transformers are a type of large language model that have been specifically designed for natural language processing. They were introduced in 2017 by Vaswani et al. in a paper called “Attention is All You Need.” The key innovation behind transformers is the attention mechanism, which allows the model to focus on specific parts of the input sequence when making predictions.

Traditional language models process sequential data by breaking it down into smaller pieces and processing each piece in a linear fashion. This makes it difficult for the model to consider the relationships between words that are far apart in the input sequence. Transformers, on the other hand, use attention mechanisms to allow the model to weigh the importance of different parts of the input sequence when making predictions.

How transformers work

Transformers work by first encoding the input sequence into a set of continuous vectors. These vectors are then processed by a number of self-attention layers, which use the attention mechanism to focus on specific parts of the input sequence when making predictions.

The attention mechanism in transformers works by calculating the relationships between all pairs of words in the input sequence. The model then uses these relationships to weigh the importance of different parts of the input sequence when making predictions. This allows the model to consider the context of words and phrases in the input sequence, leading to more accurate predictions.

After the self-attention layers, the encoded vectors are processed by a feed-forward neural network to make the final predictions. The feed-forward network includes a number of dense layers, which are fully connected layers that allow the model to make more complex predictions.

Applications of large language models and transformers

Large language models and transformers have a wide range of applications in the field of natural language processing. Some of the most notable applications include:

  • Text classification: Large language models can be used to classify text into different categories, such as sentiment analysis, topic classification, and spam detection.
  • Question answering: Large language models can be used to answer questions by extracting relevant information from a text corpus and providing a concise answer.
  • Machine translation: Transformers have been shown to be highly effective for machine translation, as they are able to handle the complexity of translating between languages and capture the relationships between words and phrases in different languages.
  • Summarization: Large language models can be used to summarize long documents into shorter versions that retain the main information.
  • Dialogue systems: Large language models can be used to build dialogue systems that can understand and respond to human language in a natural way.
  • Generative models: Large language models can also be used for generative tasks, such as writing creative fiction or composing music.

Current and future developments

In recent years, there has been rapid progress in the development of large language models and transformers. This is due in part to the availability of massive amounts of text data and the advancement of computational resources, which have allowed researchers to train models that are larger and more complex than ever before.

One of the current challenges in the field is reducing the computational cost of training and using large language models. This is important because large language models require vast amounts of computational resources, and their deployment is limited by the availability of these resources.

There are also ongoing efforts to improve the ethical and social implications of large language models and transformers. For example, there is concern about the potential for these models to reinforce biases and perpetuate harmful stereotypes. Researchers are working to develop methods for mitigating these risks and ensuring that large language models are used in a responsible and ethical manner.


Large language models and transformers have been making a significant impact on the field of natural language processing. Their ability to process and understand the context of words and phrases in a sentence has led to advances in a wide range of applications, including text classification, question answering, and machine translation. As these models continue to evolve, we can expect to see even more exciting developments in the future.


The Ghost in the Box

I happened to hear a discussion on a YouTube video yesterday about Niklas Luhmann’s mention of how his “zettlekasten” seemed to act like a “ghost in the box”. I did a quick Google search for that phrase but didn’t find anything particularly helpful.

Earlier I had watched Nick Milo’s video about how he’s been using OpenAI’s ChatGPT for research and other things. He talks about a near future where he expects to have it nearby whenever he’s working on his computer.

So I decided to ask Chat GPT to help me understand more about the “ghost in a box” by writing a 1,000 word article about it. Here’s what I got back from ChatGPT:

The Ghost in the Box

The concept of “ghost in the machine” was popularized by philosopher Gilbert Ryle in his 1949 book “The Concept of Mind.” The phrase refers to the idea that human consciousness and mental states cannot be reduced to mere physical processes in the brain.

German sociologist and systems theorist Niklas Luhmann took this concept a step further in his theory of social systems. He proposed that society itself can be seen as a “ghost in the box,” a complex system that is made up of smaller, interconnected parts that cannot be fully understood or controlled by any single individual or group.

According to Luhmann, society is made up of various “function systems,” such as the legal system, the economic system, and the political system. These systems operate independently of one another, but are also interconnected and dependent on each other. Each system has its own “code,” or set of rules and norms that govern its operation.

Luhmann argues that society as a whole can be seen as a “communication system,” which is constantly producing and processing information. This information is then used by the various function systems to make decisions and take action. However, the complexity and sheer volume of this information make it impossible for any one individual or group to fully understand or control the system.

This idea of society as a “ghost in the box” has important implications for how we understand and interact with the world around us. For example, it suggests that we can never fully understand the causes of social phenomena or predict their outcomes. It also implies that we should be cautious about attempts to manipulate or control the system, as these efforts are likely to have unintended consequences.

Furthermore, Luhmann’s theory also highlights the importance of communication and understanding the code of the different function system in order to function in the society. He also emphasizes on the importance of social differentiation, as it allows different function system to operate independently while still being interconnected.

In conclusion, Luhmann’s concept of society as a “ghost in the box” offers a unique perspective on the complexity and unpredictability of social systems. It reminds us that we can never fully understand or control the world around us, and that our actions and decisions are likely to have unintended consequences. Thus, it is important to be aware of the limitations of our understanding, and to approach the world with humility and caution.

Note: This was generated using ChatGPT Jan 9 Version.

AI ChatGPT Haiku

Haiku with ChatGPT

Like over a million others, I’ve been enjoying playing with the new ChatGPT tool from Among many other things, ChatGPT can help generate haiku – here are some examples from a recent session of mine:

Write a haiku about film photography

In moments captured
Light on film, forever frozen
Art of the past lives

Write a haiku about a quiet morning in a boat on a lake

Quiet morning calm
Ripples gently lap the shore
Birds in harmony

Write a haiku about snow falling on cedars

Snowflakes fall like tears
Upon the cedars’ green boughs
Winter’s silent grace

Write a haiku about stillness in Kyoto

In the still of night
Kyoto’s ancient temples bask
In the moon’s soft glow

Write a haiku about watching the first rain of the year

First rain of the year
Falling gently from the sky
Refreshing the earth

Write a haiku about geese flying overhead

Geese fly in formation
Honking as they pass overhead
Autumn’s wild symphony

Write a haiku about farming with a John Deere tractor

Green John Deere tractor
Plowing fields under bright blue sky
Farming’s steadfast work

Write a haiku about writing haiku

Ink flows from my pen
Expressing thoughts in simple verse
Haiku, my meditation

Try it ChatGPT yourself with a free account at Image created using DiffusionBee with the phrase: “a painting of an poet in a red beret writing haiku on a paper tablet”


An AI Thanksgiving

I asked my new friend GPT-3 to help with tell a memorable Thanksgiving story – and complemented it with a painting of a family Thanksgiving dinner generated by DALL-E.

Here was the request I made to my AI friend:

write a long story about a memorable family thanksgiving with an odd uncle on a snowy stormy day and his wife my aunt sally

Here’s the story:

Uncle Bob’s Thanksgiving

It was Thanksgiving day and the snow was falling gently outside. The family was gathered around the table, enjoying a delicious Thanksgiving feast. Uncle Bob was there, as always, with his wife Aunt Sally. But there was something different about Uncle Bob this year. He seemed odd, almost like he was in a trance. His eyes were glassy and he barely said a word.

Aunt Sally was chatty as always, but even she seemed a bit off. The family tried to make conversation with Uncle Bob, but it was like talking to a wall. Finally, after an awkward silence, Uncle Bob spoke.

“I had a dream last night,” he said, “a dream about a turkey.”

Everyone stared at Uncle Bob, not knowing what to say.

“In my dream, the turkey was alive and it was talking to me. It said, ‘Uncle Bob, you have to save me. The farmer is going to kill me and turn me into Thanksgiving dinner.'”

Uncle Bob paused and looked around the table. Everyone was still staring at him, not knowing what to say.

“So, I went to the farmer’s house and I begged him to spare the turkey’s life. He laughed at me and said, ‘No one is going to eat a talking turkey. I’m going to kill it and sell it to the grocery store.'”

Uncle Bob paused again and looked around the table. No one said a word.

“So, I went to the grocery store and I begged the manager to spare the turkey’s life. He laughed at me and said, ‘No one is going to eat a talking turkey. We’re going to kill it and sell it to the customers.'”

Uncle Bob paused again and looked around the table. Everyone was still staring at him, not knowing what to say.

“So, I went to the customers’ houses and I begged them to spare the turkey’s life. But they all just laughed at me and said, ‘No one is going to eat a talking turkey. We’re going to kill it and eat it for Thanksgiving dinner.'”

Uncle Bob paused one last time and looked around the table. Everyone was still staring at him, not knowing what to say.

“And that’s when I woke up,” Uncle Bob said.

There was a long silence around the table. Finally, Aunt Sally spoke up.

“That was a very strange dream, Bob,” she said. “Are you sure you’re feeling all right?”

Uncle Bob just stared at her and didn’t say a word.

The family finished their Thanksgiving feast in silence. Uncle Bob never said another word and Aunt Sally just seemed to be in a daze. It was a very strange and memorable Thanksgiving for everyone.

I hope you and your family have a lovely Thanksgiving – and share lots of good stories too!

AI Writing

Using AI to Help Write

I’ve been observing the evolution of AI-based tools from a distance. I’m not very experienced with using these tools myself, but I thought I’d share some of my excursions into this fascinating domain of AI-assisted or generative AI tools.

One of the text-based AI tools is GPT-3 from OpenAI. Late last year, OpenAI opened up a playground where one can experiment with using GPT-3 to assist with text – helping create text, edit text, summarize text, and more. One of the illustration AI tools is called DALL-E and it’s also provided by OpenAI. I’ve been experimenting and learning with both of these tools.

Here’s an example of using GPT-3 to help write. This morning I happened to see a tweet where the author was talking about an article he’d pulled together about how writers should focus more on niches – it’s titled 7 Simple Ways to Niche Down. It’s a good article worth reading. As I was reading it, I began wondering how GPT-3 might help with writing an article on this subject – so I asked it:

write an article about how to niche down your writing

Here’s what I got back from GPT-3 just using the default settings:

If you’re a writer, chances are you wear many hats. You’re a wordsmith, a storyteller, and a communicator. But in order to be a successful writer, you need to be able to niche down your writing.

Niche writing is a specific type of writing that appeals to a specific audience. It’s about writing for a particular market or genre, and it requires you to know your audience inside and out.

The first step to niching down your writing is to identify your target audience. Who are you writing for? What are their needs and wants? What do they like to read?

Once you know your audience, you can start to focus your writing. What topics will appeal to them? What type of writing will they respond to?

If you’re not sure where to start, consider writing guest posts for blogs or magazines that your target audience reads. This is a great way to get your name and your work in front of them.

Another option is to create a blog or website of your own and focus on writing articles, tips, or tutorials that will help your target audience. This will not only help you attract readers, but it will also show them that you’re an expert in your field.

Finally, don’t forget to promote your work! Make sure your target audience knows about your writing by sharing it on social media, submitting it to directories or list-serves, or even just telling people about it in person.

Niche writing can be a great way to build a successful writing career. By targeting a specific audience, you can become known as an expert in your field and attract a loyal following of readers. So if you’re ready to niche down your writing, start by identifying your target audience and then focus your writing to appeal to them.

Not bad. I had a subject and was looking for help putting together an article on the subject. I did minimal work but left with the bones I might consider fleshing out if I actually wanted to write an article on that subject. I also asked GPT-3 to help me clean up the opening paragraph above – which it rewrote for me.

To add to this post, I wanted an illustration that captured a writing theme so I asked DALL-E to generate one for me – asking it to generate:

an Edward hopper painting showing a writer at work in front of an old typewriter

That’s the image you see above.

The power of this kind of AI-based technology is mind-blowing to a layman like me. And its capabilities seem to be accelerating rapidly. I’ve got so much more to learn – and I’ve having fun just beginning to wade in. If you’re doing any experimentation similar to this, let me know what you’re learning!