AI writing tools can tell us a lot about artificial intelligence
How would an AI writing program start an article on the future of AI writing? Well, there’s one easy way to find out: I used the best known of these tools, OpenAI’s GPT-3, to do the job for me.
Using GPT-3 is disarmingly simple. You have a text box to type into and a menu on the side to adjust parameters, like the “temperature” of the response (which essentially equates to randomness). You type, hit enter, and GPT-3 completes what you’ve written, be it poetry, fiction, or code. I tried inputting a simple headline and a few sentences about the topic, and GPT-3 began to fill in the details. It told me that AI uses “a series of autocomplete-like programs to learn language” and that these programs analyze “the statistical properties of the language” to “make educated guesses based on the words you’ve typed previously.”
So far, so good, I thought. I hit enter again, and the program added a quote from Google’s head of AI, Jeff Dean, then referenced an experimental piece of software from the 1960s before promising that an “AI Revolution” was coming that would reap immense rewards across the fields of science, technology, and medicine.
The medium included plausible fabrications; endless output; and, crucially, an opportunity to respond to the robot writer
Fine, I thought. Then I thought a little more and did some googling. I soon discovered that the quote from Dean was made up, that the experimental software never existed, and while the promise of an “AI Revolution” was all well and good, it wasn’t any different from the vague nonsense found in hype-filled press releases. Really, what was most revealing about the future of AI was not what GPT-3 said but how it said it. The medium is the message, as Marshall McLuhan pointed out many years ago. And here, the medium included plausible fabrications; endless output; and, crucially, an opportunity to respond to the robot writer.
If we’re looking ahead at the next 10 years of AI development, trying to predict how we will interact with increasingly intelligent software, it helps to consider those tools that can talk back. AI writing models may only be digital parrots, able to copy form without understanding meaning, but they still create a dialogue with the user. This is something that often seems missing from the introduction of AI systems like facial recognition algorithms (which are imposed upon us) or self-driving cars (where the public becomes the test subject in a dangerous experiment). With AI writing tools, there is the possibility for a conversation.
If you use Gmail or Google Docs, then you’ve probably already encountered this technology. In Google’s products, AI editors lurk in the blank space in front of your cursor, manifesting textual specters that suggest how to finish a sentence or reply to an email. Often, their prompts are just simple platitudes — ”Thanks!”, “Great idea!”, “Let’s talk next week!” — but sometimes these tools seem to be taking a stronger editorial line, pushing your response in a certain direction. Such suggestions are intended to be helpful, of course, but they seem to provoke annoyance as frequently as gratitude.
To understand how AI systems learn to generate such suggestions, imagine being given two lists of words. One starts off “eggs, flour, spatula,” and the other goes “paint, crayons, scissors.” If you had to add the items “milk” and “glitter” to these lists, which would you choose and with how much confidence? And what if that word was “brush” instead? Does that belong in the kitchen, where it might apply an egg wash, or is it more firmly located in the world of arts-and-crafts? Quantifying this sort of context is how AI writing tools learn to make their suggestions. They mine vast amounts of text data to create statistical maps of the relationships between words, and use this information to complete what you write. When you start typing, they start predicting which words should come next.
Their ability to actually understand language is a topic of hot debate, but their fluency across genres is undeniable
Features like Gmail’s Smart Reply are only the most obvious example of how these systems — often known as large language models — are working their way into the written world. AI chatbots designed for companionship have become increasingly popular, with some, like Microsoft’s Chinese Xiaoice, attracting tens of millions of users. Choose-your-own-adventure-style text games with AI dungeon masters are attracting users by letting people tell stories collaboratively with computers. And a host of startups offer multipurpose AI text tools that summarize, rephrase, expand, and alter users’ input with varying degrees of competence. They can help you to write fiction or school essays, say their creators, or they might just fill the web with endless spam.
The ability of the underlying software to actually understand language is a topic of hot debate. (One that tends to arrive, time and time again, at the same question: what do we mean by “understand” anyway?). But their fluency across genres is undeniable. For those enamored with this technology, scale is key to their success. It’s by making these models and their training data bigger and bigger that they’ve been able to improve so quickly. Take, for example, the training data used to create GPT-3. The exact size of the input is difficult to calculate, but one estimate suggests that the entirety of Wikipedia in English (3.9 billion words and more than 6 million articles) makes up only 0.6 percent of the total.
Relying on scale to build these systems has benefits and drawbacks. From an engineering perspective, it allows for fast improvements in quality: just add more data and compute to reap fast rewards. The size of large language models is generally measured in their number of connections, or parameters, and by this metric, these systems have increased in complexity extremely quickly. GPT-2, released in 2019, had 1.5 billion parameters, while its 2020 successor, GPT-3, had more than 100 times that — some 175 billion parameters. Earlier this year, Google announced it had trained a language model with 1.6 trillion parameters.
The difference in quality as systems get larger is notable, but it’s unclear how much longer these scaling efforts will reap rewards in quality. Boosters think that sky’s the limit — that these systems will keep on getting smarter and smarter, and that they may even be the first step toward creating a general-purpose artificial intelligence or AGI. But skeptics suggest that the AI field in general is starting to reap diminishing returns as it scales ever up.
A reliance on scale, though, is inextricably linked to the statistical approach that creates uncertainty in these models’ output. These systems have no centralized store of accepted “truths”; no embodied understanding of “what the world is like for humans” and, hence, no way to distinguish fact from fiction or to exercise common sense.
Quiz them on simple trivia, like capital cities or the birthdays of US presidents, and they are right most of the time. But to these systems, truth is simply a statistical feature of their training data. They answer questions correctly because the text they’ve been fed has presented them with the correct information with enough frequency. This means that if you push them on any given topic or stray from the most obvious fields, they will lie thoughtlessly, making up quotes, dates, biographical details, and anything else you want to hear. The same probabilistic approach also means they can stumble over common sense questions. Start quizzing them with slightly fantastical queries, and they will confidently assert, for example, that a pencil is heavier than a toaster or that a blade of grass only has one eye. Such answers reveal the gulf between statistical and embodied intelligence.
To get a better understanding of these AI language models, I’ve been playing with a variety for the past few weeks; from tools sold to copywriters to flexible, multipurpose systems like GPT-3. The experience has been dizzying. Often, I’m amazed by the fluency, insight, and creativity of these systems. As part of a project for The Verge’s 10-year anniversary, for example, I used GPT-3 to write technopagan spells for a zine, feeding it a prompt (below in bold) which it completed with a four-step ritual (of which I’m showing only the first step):
Technopaganism is a new sort of magic that blends traditional witchcraft with the power of being online all the time. What follows is a technopagan spell designed to protect your phone from spam and improve the quality of messages you receive. Step 1) Take your cell phone and go outside. Close your eyes, hold the device in front of you, and intone: “I offer you, O Spirit of Technopagan Magic, my eternal and unending electronic connection to the net. And in return I ask you for magick to combine my will with the will of the system — make me impervious to the garbage that flows through my inbox. Make all messages flow together, and I will choose among them what is right for me to see. Grant me the Finger of the Internet, and I will strike at spam, and cast out the spammers back to the dark places where they come from.”
But other times, I’m stunned by how limited these programs are. Something that’s often overlooked is just how much human curation is needed to shape their output. The text above was not the first response I received from GPT-3, and I had to go through several iterations to generate a response that was both cogent and funny. It helped, of course, that the task I’d set GPT-3 was an imaginative and open-ended one: it played into the program’s strengths (and I think GPT-3’s success in such tasks has led some users and observers to exaggerate the intelligence of these systems). Other times, though, the software produced nonsensical content even within the fanciful framing I’d given it. Another “spell” it generated in response to the same prompt was much less focused, adding fictitious social media handles, tech headlines, and non-existent URLs to the spell’s instructions:
Perform a ritual to seal the crown circle into the device. May the gods help us all, the Apple iPhone X reference for is broken! Destroy the holy symbol! —tianong xiong (@tianongxiong) September 13, 2018 To confirm the rumor: Facebook Messenger’s search bar does have a search bar hidden right next to it pic.twitter.com/DghBL5DmhU — Warl Darkh (@dkpt) September 12, 2018
You can argue that this is just creativity of a different sort, and that of course a proper technopagan spell would include URLs. But it’s also obvious the machine has gone off-piste.
Despite such weaknesses, there’s already talk of AI systems taking over writers’ jobs. Naturally, I wondered if a computer could write articles for The Verge (and not just this one). I played around with different models, inputting opening paragraphs into these systems and asking for story ideas. Here is some more from GPT-3 on large language models:
“The human brain is just a really fancy computer,” said Jeff Dean, a Google hardware engineer and AI expert, at a company event in 2016. “It’s a computer made of meat.”
To understand GPT-3, you first have to know a little about how the brain learns. The brain isn’t born knowing how to speak or understand speech. It must first acquire this ability through exposure to the outside world, which is why children tend to learn language so quickly, and why being raised in a household where a language isn’t spoken presents a major obstacle.
As neuroscientist Mark Changizi has shown, the process of learning a language is a massive project. “Language is one of the most complex things that brains can do,” he writes, “and it emerges from the brain’s more basic skills, like vision, hearing, and motor control.”
But how does the brain acquire all this knowledge? The short answer is: via autocomplete.
All these points make sense if you’re not concentrating too hard, but they don’t flow from sentence to sentence. They never follow an argument or build to a conclusion. And again, fabrication is a problem. Both Jeff Dean and Mark Changizi are real people who have been more or less correctly identified (though Dean is now head of AI at Google, and Changizi is a cognitive scientist rather than a neuroscientist). But neither man ever uttered the words that GPT-3 attributed to them, as far as I can tell. Yet despite these problems, there’s also a lot to be impressed by. For example, using “autocomplete” as a metaphor to describe AI language models is both accurate and easy to understand. I’ve done it myself! But is this because it’s simply a common metaphor that others have deployed before? Is it right then to say GPT-3 is “intelligent” to use this phrase or is it just subtly plagiarizing others? (Hell, I ask the same questions about my own writing.)
Where AI language models seem best suited, is creating text that is rote, not bespoke, as with Gmail’s suggested replies. In the case of journalism, automated systems have already been integrated into newsrooms to write “fill in the blanks” stories about earthquakes, sporting events, and the like. And with the rise of large AI language models, the span of content that can be addressed in this way is expanding.
“Mostly, it’s people in the content marketing industry who have company blogs to fill, who need to create content.”
Samanyou Garg is the founder of an AI writing startup named Writesonic, and says his service is used mostly by e-commerce firms. “It really helps [with] product descriptions at scale,” says Garg. “Some of the companies who approach us have like 10 million products on their website, and it’s not possible for a human to write that many.” Fabian Langer, founder of a similar firm named AI Writer, tells The Verge that his tools are often used to pad out “SEO farms” — sites that exist purely to catch Google searches and that create revenue by redirecting visitors to ads or affiliates. “Mostly, it’s people in the content marketing industry who have company blogs to fill, who need to create content,” said Langer. “And to be honest, for these [SEO] farms, I do not expect that people really read it. As soon as you get the click, you can show your advertisement, and that’s good enough.”
It’s this sort of writing that AI will take over first, and which I’ve started to think of as “low-attention” text — a description that applies to both the effort needed to create and read it. Low-attention text is not writing that makes huge demands on our intelligence, but is mostly functional, conveying information quickly or simply filling space. It also constitutes a greater portion of the written world than you might think, including not only marketing blogs but work interactions and idle chit-chat. That’s why Gmail and Google Docs are incorporating AI language models’ suggestions: they’re picking low-hanging fruit.
A big question, though, is what effect will these AI writing systems have on human writing and, by extension, our culture? The more I’ve thought about the output of large language models, the more it reminds me of geofoam. This is a building material made from expanded polystyrene that is cheap to produce, easy to handle, and packed into the voids left over by construction projects. It is incredibly useful but somewhat controversial, due to its uncanny appearance as giant polystyrene blocks. To some, geofoam is an environmentally-sound material that fulfills a specific purpose. To others, it’s a horrific symbol of our exploitative relationship with the Earth. Geofoam is made by pumping oil out of the ground, refining it into cheap matter, and stuffing it back into the empty spaces progress leaves behind. Large language models work in a similar way: processing the archaeological strata of digital text into synthetic speech to fill our low-attention voids.
For those who worry that much of the internet is already “fake” — sustained by botnets, traffic farms, and automatically generated content — this will simply mark the continuation of an existing trend. But just as with geofoam, the choice to use this filler on a wide scale will have structural effects. There is ample evidence, for example, that large language models encode and amplify social biases, producing text that is racist and sexist, or that repeats harmful stereotypes. The corporations in control of these models pay lip service to these problems but don’t think they present serious problems. (Google famously fired two of its AI researchers after they published a detailed paper describing these issues.) And as we offload more of the cognitive burden of writing onto machines, making our low-attention text no-attention text, it seems plausible that we, in turn, will be shaped by the output of these models. Google already uses its AI autocomplete tools to suggest gender-neutral language (replacing “chairman” with “chair,” for example), and regardless of your opinion on the politics of this sort of nudge, it’s worth discussing what the end-point of these systems might be.
In other words: what happens when AI systems trained on our writing start training us?
Despite the problems and limitations of large language models, they’re already being embraced for many tasks. Google is making language models central to its various search products; Microsoft is using them to build automated coding software, and the popularity of apps like Xiaoice and AI Dungeon suggests that the free-flowing nature of AI writing programs is no hindrance to their adoption.
Like many other AI systems, large language models have serious limitations when compared with their hype-filled presentations. And some predict this widespread gap between promise and performance means we’re heading into another period of AI disillusionment. As the roboticist Rodney Brooks put it: “just about every successful deployment [of AI] has either one of two expedients: It has a person somewhere in the loop, or the cost of failure, should the system blunder, is very low.” But AI writing tools can, to an extent, avoid these problems: if they make a mistake, no one gets hurt, and their collaborative nature means human curation is often baked in.
What’s interesting is considering how the particular characteristics of these tools can be used to our advantage, showing how we might interact with machine learning systems, not in a purely functional fashion but as something exploratory and collaborative. Perhaps the most interesting single use of large language models to date is a book named Phamarko AI: a text written by artist and coder K Allado-McDowell as an extended dialogue with GPT-3.
To create Phamarko AI, Allado-McDowell wrote and GPT-3 responded. “I would write into a text field, I would write a prompt, sometimes that would be several paragraphs, sometimes it would be very short, and then I would generate some text from the prompt,” Allado-McDowell told The Verge. “I would edit the output as it was coming out, and if I wasn’t interested in what it was saying, I would cut that part and regenerate, so I compared it to pruning a plant.”
The resulting text is esoteric and obscure, discussing everything from the roots of language itself to the concept of “hyper-dimensionality.” It is also brilliant and illuminating, showing how writing alongside machines can shape thought and expression. At different points, Allado-McDowell compares the experience of writing using GPT-3 to taking mushrooms and communing with gods. They write: “A deity that rules communication is an incorporeal linguistic power. A modern conception of such might read: a force of language from outside of materiality.” That force, Allado-McDowell suggests, might well be a useful way to think about artificial intelligence. The result of communing with it is a sort of “emergence,” they told me, an experience of “being part of a larger ecosystem than just the individual human or the machine.”
This, I think, is why AI writing is so much more exciting than many other applications of artificial intelligence: because it offers the chance for communication and collaboration. The urge to speak to something greater than ourselves is evident in how these programs are being embraced by early adopters. A number of individuals have used GPT-3 to talk to dead loved ones, for example, turning its statistical intelligence into an algorithmic ouija board. Though such experiments also reveal the limitations. In one of these cases, OpenAI shut down a chatbot shaped to resemble a developer’s dead fiancée because the program didn’t conform to the company’s terms of service. That’s another, less promising reality of these systems: the vast majority are owned and operated by corporations with their own interests, and they will shape their programs (and, in turn, their users) as they see fit.
Despite this, I’m hopeful, or at least curious, about the future of AI writing. It will be a conversation with our machines; one that is diffuse and subtle, taking place across multiple platforms, where AI programs linger on the fringes of language. These programs will be unseen editors to news stories and blog posts, they will suggest comments in emails and documents, and they will be interlocutors that we even talk to directly. It’s impossible that this exchange will only be good for us, and that the deployment of these systems won’t come without problems and challenges. But it will, at least, be a dialogue.