Daily, snackable writings and podcasts to spur changes in thinking.
Building a blueprint for a better brain by tinkering with the code.
The first illustrated book from Tinkered Thinking will soon be available.Subscribe below to get a notification.
July 21st, 2020
After playing around with the new tool from OpenAI, named GPT-3, it’s clear that we’ve probably reached an inflection point in the progress of artificial intelligence. To think of it in more commonplace terms, parents will often have the interesting experience of watching their children suddenly gain competency in a skill, seemingly overnight. There will be lots of struggle and failed starts and then as if all in one moment, boom, suddenly the kid can do something new. GPT-3 hails a moment when machine learning got turned on and suddenly it could do an amazing new trick, and it does this new trick with astonishing coherence and ability. The hope of this episode is to present a description of what GPT-3 is in layman’s terms and then explore some possible ways that this could be used, and how it might impact society.
But first, what is GPT? The acronym stands for Generative Pretrained Transformer. Ok sure, but what exactly is it? How do we interact with it? In simple terms it’s a program that accepts some text and then generates a continuation of this text. What exactly does this look like? For those with access to the GPT-3 playground through OpenAI, it just looks like an empty textbox with a submit button.
Now enters a principle that has emerged during the last week as developers have played around with it:
Garbage in, Garbage out.
GPT-3 mimics the style, tone and character of whatever text you feed it. (For those curious, you can hit the submit button with an empty textbox and GPT-3 will generate something. The first time I did this it started talking in French about libraries. The second time it started talking about heatwaves being in the headlines.) There is a bit of an art to the prompt you give GPT-3. Again, garbage in, garbage out.
Here are two examples:
Input: What’s up?
GPT-3 continuation: A confused voice came from inside. When I opened the door, the person that looked back at me was Hayama Hayato. Why was Hayama, who I only shared memories of me playing soccer with, in my room at this hour of the night? That question immediately flew out from my mouth.
Second example, same exact input:
Input: What’s up?
GPT-3 continuation: they respond 'Super' – at least they were back in the days of Fox Mulder and Dana Scully. I love the little sixties touches too, the poodle skirts, the VWs, the hippydom.
The input of “what’s up?” doesn’t provide any context, so GPT-3 generates text in a random direction each time. But notice how coherent it was…
But if you provide GPT-3 with context, say, the first couple paragraphs of an episode from Tinkered Thinking and it will generate a continuation of that episode which is shockingly on point and very believable. Episode 823 and 824 are non-fiction episodes that explore just this idea. About half of each episode was generated by GPT-3. This collaboration was also done for Episode 826 which is a fictional short story. GPT-3’s contribution to that episode is not explicitly indicated but there too it wrote approximately half of the text, and that episode in particular demonstrates an aspect of context in relationship to GPT-3. The program’s initial contributions to the fictional short story are a bit clunky. But as the story goes on, GPT-3’s contributions get better and better as the entire context of the story grows.
It should be obvious at this point that if you were to do something nefarious like input some dangerous ideas into GPT-3, then it would continue in the same vein. This is an important point that lends to obvious criticism and it’s a topic of which OpenAI is extremely conscious. The use of GPT-3 in the wild requires a security application among other safeguards to limit GPT-3’s use in obviously negative ways. Again, the principle holds: garbage in, garbage out. That holds not just for level of composition but of course, also subject matter.
Now how exactly does this magic work? To answer this, it’s necessary to wrap one’s mind around machine learning. How exactly does a machine learn? To be honest, nobody really knows. In a way, we’ve created a black box that can learn to figure things out. To really answer the question: how exactly does a machine learn is similar to asking: how does a person arise from all the chatter between neurons in a person’s brain? No one can answer that question. And likewise, no one can really answer the question about what’s really going on when a neural net is being trained in a machine learning context. That being said, we can describe some aspects of the process that help us grasp a sense of what’s going on.
Turns out, Machine Learning doesn’t seem too much different than human learning, though the differences are enormous and enormously important. Our brains are vast networks of neurons and they all talk to each other and somehow from all that chatter, we as individuals, emerge. To be sure, our understanding of the brain is still rudimentary, but it’s clear that neurons ‘listen’ to the chatter around them via a web of arms or dendrites that reach out to other neurons, and based on the signals they hear, they fire their own signal to be heard by another neuron or set of neurons. Multiply that process roughly 86 billion times, give it the right shape, and some configuration and boom, you have a human brain that will try to figure things out. (To dive a little deeper it’s useful to know that an infant actually has around 100 billion neurons, and as we learn how to exist in the world, we pare down this number significantly.)
GPT-3 was trained to generate text using a computational model that bares a lot of similarity to our jumble of neurons we call a brain. The analogy is so close in fact that this computational model is referred to as a ‘neural net’. Now here’s an interesting fact about GPT-3: the neural net upon which it was trained contains 175 billion parameters, or what you might think of as neurons. But there’s a distinction here we need to be very careful about. The node, or neuron, or what’s often referred to as a ‘parameter’ within this neural net is not a neuron in the same way that our neurons are. There is a similarity, but our neurons are vastly more complex than the nodes that exist within a neural net. Within the context of machine learning, a neuron is a fairly simple weighted value. These values dictate how one node talks to another. An actual human neuron is far more complex regarding how it listens and signals it’s neighbors. We must also take into account that there are many different types of neurons in the human brain specializing in different ways with different functions. Human neurons also have varying numbers of dendrites which are used to listen to other neurons. A single neuron can have over 100 dendrites and each dendrite has characteristics that determine how and what the core neuron hears and ultimately what sort of signal it sends. This presents a level of complexity that far exceeds what is going on with the nodes or neurons in a machine learning context. And this difference in complexity should make it intuitive why GPT-3 was trained using so many ‘neurons’. It’s construction and organization of these ‘neurons’ is far simpler and more straightforward than what we have going on between our ears, so where the neural net lacks in terms of shape and initial configuration, it attempts to make up for in size and the brute force achieved by speed and computational iteration.
So how does a neural net learn in order for something like GPT-3 to work? First you need a gargantuan amount of data. GPT-3 was trained using text from the internet - an amount of text that is just inconceivable for a single human being to think about reading. Think of every comment, every post, every description, every pdf book freely available, all the tweets and blogs, manuals, dissertations, threads, rantings - all of it. That was the block of text that was given to the neural net for it’s training.
So how does that training work? Imagine I gave you a paragraph to read, but I left off the last word, and asked you to guess what that word is. As a human, you’d probably do really well at this. You’d certainly do a perfect job guessing a sensible word that fits the context and which works, regardless of whether you guess the actual word. Here’s a tiny example: Please provide the last word that should be added to the end of this ___________ .
The above example is easy. The word is ‘sentence’. But. Since I’ve started a new paragraph, you could also plop the word ‘paragraph’ into the slot and it still works. This is the task that GPT-3 has become remarkably good at: what word would make sense to come next? The training of the neural net essentially asked this question of itself over and over. We can imagine the neural net reading 99 words and guessing what the 100th word would be. It would then check the answer and see how far ‘off’ the answer was, and based on the difference it would backtrack through the neural net to find all of the nodes that were most responsible for the choice of the wrong word, and then it would adjust the weights of those nodes, which in turn effects how those nodes will talk to one another in the next round. Run this game an astronomical number of times with an inconceivably large amount of text and after a while the neural net gets pretty good at the game it’s playing.
What’s important to note is that the computer is not actually ‘reading’ the text that’s given to it. The computer first creates embeddings for this block of text. What is an embedding? It’s arbitrary, but it’s a form that the computer can actually work with. Computers function using binary code. Each letter that you read on a website is actually derived from a binary code. So embeddings are bits of binary code that are associated with the words and word-snippets that the computer ‘reads’. These embeddings never change.
A good way to think about it is to realize that humans use their own embeddings. Our form of embeddings are individual words themselves. For example, you can hear the sound of the word ‘boat’, or you can read the graphical symbols that indicate the word ‘boat’ in your mind, but that set of graphical marks and the sound your hear are arbitrary - they don’t actually mean anything on their own. If you said the word ‘boat’ or showed the word ‘boat’ to someone who has no understanding of english, it would be meaningless to them. They have a different word or embedding for that concept. This is quite similar to how embeddings work in a neural network. Every time the computer ‘reads’ the word ‘boat’ it converts it into the same binary string, and uses that binary string to push into the neural net. This discussion of embeddings is important because it’s pretty magical to realize that GPT-3 doesn’t actually know any words. Aside from the already problematic use of the word ‘know’ in relation to a computer, the point here is to get at a description of what is at the core of GPT-3’s success. Through all of this weighted calibration using embedded language, GPT-3 has ‘learned’ the subtle rules that dictate how we humans pick our words in different contexts.
Dialect and jargon are helpful concepts here. For example, if you go to the hospital and talk to a doctor, you expect to hear a whole bunch of complex language that relates to biology and medicine, and if this hospital is in London, you expect to hear a British accent. Even if we don’t know all the words, we’d be able to tell something is amiss if we suddenly start hearing a whole bunch of nautical sailing jargon, the words of which we also don’t know.
If a friend starts relating a story about how they went sailing, we can predict that there’s a fairly high chance we’re going to hear the word ‘boat’. GPT-3 has essentially played that probability game a ridiculously unfathomable number of times in every context that humans have written about and that’s been plopped on the internet.
So it’s completely inaccurate to say that GPT-3 knows something about any given subject. If it knows anything, it’s perhaps best described as a really good guess about which word would come next.
So why all the hype? Simply put, it does an astonishingly good job at that guessing game. It’s so good that often you just can’t tell that it’s generated text, that wasn’t written by a human. If prompted correctly it responds identically a human.
How does this neat trick translate? What are the implications?
Well, let’s imagine a scenario here: Let’s say a friend shows up to work, and they’ve got a nice smile on their face. You grow curious and ask “What’s up?” And your friend begins to tell you about an app they just downloaded. It’s an Artificial Intelligence that works as a therapist, and your friend tells you they just had the most amazing insights during their therapy session that they had while walking to work.
Now, unlike a real therapist who is human that needs to write down information about sessions in files that they reference, a GPT powered therapist could have a perfect memory of absolutely everything you’ve ever said. You could then ask your friend in astonishment: how does that work? And your friend might say, well, first it’s super affordable, oh my gosh, just a couple of dollars a month and I have unlimited sessions. But to get started, you have to talk about yourself for 10 hours, but luckily I’ve been keeping a journal on a private blog which was a hundred times bigger in word-count, so I was able to submit that blog and boom, I had access.
If you think that description might worry a therapist, let’s swing in a different direction:
Let’s say you are a human rights activist and you’ve dedicated your whole life to monitoring the ways that corporations try to take advantage of governments through lobbyists and manipulating legislation. And right now there is a bill on the floor being considered. You know there’s probably something nefarious hidden in the details of this bill, but the bill is 2,000 pages long, and there’s just no way you can go over it with a fine-tooth comb in order to figure out just what’s in there. But, a colleague has just built a tool using an advanced GPT that can take any amount of text and summarize it. Not only that, but this specific GPT has been additionally trained to flag details that don’t seem connected with the main text, or details that hint at the creation or use of loopholes in existing law, because naturally, it’s also read all of existing law and has a perfect memory of all legal text. You feed the new bill to this special GPT and boom, within a minute you have an accurate 5 page summary and a list of 27 sentences and paragraphs spread across the 2,000 page document that it thinks you should specifically look at for trouble.
These are just two imaginary ways that the technology of GPT-3 could be implemented. And one of these ways, that of the pocket therapist has already been achieved in a somewhat rudimentary way during the week or so that GPT-3 has been available.
The ramifications for how this advance might rattle through society seem ubiquitous. Language is the fabric of society. Our use and misuse of language dictates the rise and fall of all our endeavors, and a computer just got really really good at imitating our language.
donating = loving
If you appreciate the work of Tinkered Thinking, please consider lending support. This platform can only continue and flourish with the support of readers and listeners like you.
Appreciation can be more than a feeling. Toss something in the jar if you find your thinking delightfully tinkered.