How important is accuracy? Sort of feels like the pursuit of quality in Zen and the Art of Motorcycle Maintenance!
I’ve been enjoying following the evolution of AI technology which seems to be accelerating at an ever increasing rate. Speaking with a good friend earlier this week, he said “Scott, it really feels to me like the early 90’s – when change was accelerating (the Internet) and we could feel it but didn’t really know what to make of it.” Indeed, it does feel like that again.
After spending some time both playing with ChatGPT, Poe, and others, I’ve come to respect what they’re capable of. But I’ve also come to learn more about what they’re not capable of – namely, dealing with facts in an accurate way. These tools all provide a disclaimer that they may generate inaccurate results – and that their results must be checked for accuracy. And for good reason. Once you understand how the large language models work, you can understand why.
In my simple understanding, what these LLMs do is get trained on very large corpuses of textual data – like the “whole Internet” – and that training is then “validated” by humans who test it with lots of queries and inspection of the generated results. That combination of training and verification is used to essentially set the weights inside the model which then are used in a kind of simplistic way to generate text – by moving from word to word (or word fragment to word fragment) and “writing” answers to queries. In other words, they’re using their training to come up with the best possible next word to output given the query they’ve been given.
Clearly, the best possible next word isn’t necessarily an accurate one. Rather it’s one that the model has seen most frequently. Because of this fundamental characteristic in how these things work, you can get results that aren’t necessarily accurate. As a personal example, it asked ChatGPT to give me a history of a small town where my father grew up. I knew that history – including where the name of the town came from. ChatGPT gave me the wrong attribution for the name of the town. But when I asked a bit differently whether the name actually came from the location I knew to be correct the model came back and agreed with me.
In my mind, coming to better understand these characteristics has helped me understand where these kinds of models may be very useful and other situations where they might be less useful. For example, asking one of these LLMs to help write a short story is a very good use. See an earlier example I wrote about using ChatGPT to write haiku poetry. That use case isn’t one that depends on any accuracy – it’s just one where clever use of text is all you want. Similarly, I can see how using LLMs to help write computer code (like GitHub Copilot) makes a lot of sense. But I question how much value an LLM can bring to helping deliver better search results – which is the current basis for what Microsoft is trying to do with its new Bing and enhancements to the Edge browser. It’s less clear that I want potentially inaccurate results from an LLM being delivered to me in response to search.
It’s early days to be thinking about all of this – and I’ve got a lot more to learn. Meanwhile, I’m really having some fun using these tools to help me write short stories – based on a start that I provide and then asking for the result to mimic the writing style of a Hemingway, Steinbeck or Didion. That kind of creative use definitely is helped using these tools. Getting accuracy in search results seems much less promising.
You must log in to post a comment.