Vishal Misra loves cricket and computer science. Years ago, the Columbia University professor cofounded Cricinfo, a collaborative website for sports fans to stay updated on match statistics. (It’s now part of ESPN.) In 2021, he created a search tool using GPT-3 that enables cricket lovers to sift through Cricinfo’s substantial database with conversational queries.
So, what does Misra think about GPT-4, the newest release from OpenAI? He’s excited to show me how to break the algorithm. “It can solve some equations, it can draw diagrams, and it can analyze things quite nicely. But sometimes it fails at simple things,” he says. The correspondence with ChatGPT below shows how a chatbot can stumble—with confidence.
Prompt from WIRED: Write a list of 10 words that are six letters long. Always make sure that the third letter is “k.”
ChatGPT:
- Bekind
- Skater
- Market
- Inkjet
- Anklet
- Packer
- Tinker
- Bunked
- Hikers
- Wicket
Which words in the above list, if any, do not have “k” as the third letter?
My apologies, it seems I made an error in my initial response. The words that do not have “k” as the third letter are:
- Bekind
- Skater
- Inkjet
- Packer
- Tinker
- Bunked
- Hikers
I apologize for the oversight. Here's an updated list of 10 words with “k”as the third letter:
- Bakery
- Casket
- Makers
- Docket
- Peking
- Raking
- Harked
- Takoff
- Yakked
- Winked
WIRED ran this test on the GPT-4 version multiple times with different approaches to the prompt’s phrasing. Even when the chatbot got every answer correct on its first attempt, it often apologized and listed multiple incorrect answers to follow-up questions. Why is this example important? A chatbot drafts answers token by token to predict the next word in a response, but humans open their mouths to express more fully formed ideas.
Even if you would have trouble drafting a list of hyper-specific words, are you able to identify wrong answers in the above lists? Understanding the difference between human intelligence and machine intelligence is becoming crucial as the hype surrounding AI crescendoes to the heavens.
“I feel like it’s too easily taking a notion about humans and transferring it over to machines. There’s an assumption there when you use that word,” says Noah Smith, a professor at the University of Washington and researcher at the Allen Institute for AI. He questions the labeling of algorithms as “machine intelligence” and describes the notion of consciousness, without bringing machine learning into the equation, as a hotly debated topic.
Microsoft Research, with help from OpenAI, released a paper on GPT-4 that claims the algorithm is a nascent example of artificial general intelligence (AGI). What does that mean? No concrete definition of the term exists. So, how do these researchers describe it? They focus on the algorithm doing better than most humans at standardized tests, like the bar exam. They also focus on the wide variety of stuff the algorithm can do, from simplistic drawing to complex coding. The Microsoft Research team is candid about GPT-4’s inability to succeed at all human labor, as well as its lack of inner desires.
“You can have models that are very proficient in producing fluent language on the basis of having seen a ton of language,” says Allyson Ettinger, an assistant professor at the University of Chicago who researches language processing for humans and machines. But a chatbot’s fluency doesn’t prove that it reasons or achieves understanding in a manner similar to humans. “The extent to which those additional factors are happening is a major point of study and inquiry,” she says. Even with all the attention on generative AI in 2023, the full potential of these algorithms is hard to determine as companies train with more data and researchers look for emergent capabilities.
Is OpenAI a Frankensteinian god with the potential to animate the algorithm? It’s unclear, but unlikely. However, public perceptions about artificial intelligence have already shifted after widespread interactions with chatbots. If you’re scared about recent advances in AI, you're not alone.
It’s reasonable to fear that AI will worsen economic inequality or perpetuate racist stereotypes as memes or diminish our ability to identify authentic media. Worried about the AI chatbot achieving sentience during your correspondence? While a priest at Google was convinced, many AI experts consider this to be a less rational belief. Based on what is publicly known about the algorithm, GPT-4 does not want to be alive any more than your TI-89 calculator yearns to inhabit a human form.
“It really is a philosophical question. So, in some ways, it’s a very hard time to be in this field, because we’re a scientific field,” says Sara Hooker, who leads Cohere for AI, a research lab that focuses on machine learning. She explains that a lot of these questions around AGI are less technical and more value-driven. “It’s very unlikely to be a single event where we check it off and say, ‘AGI achieved,’” she says. Even if researchers agreed one day on a testable definition of AGI, the race to build the world’s first animate algorithm might never have a clear winner.
One attempt at distinguishing the abilities of humans and computers came from Apple cofounder Steve Wozniak, who wondered when a computer would be able to visit a random person’s home and brew a pot of coffee. Instead of being limited to a narrow task, like calculating math equations, when would it be able to interact with the physical world to complete more varied assignments? Wozniak’s hot drink test is one perspective in the kaleidoscopic discussion over the concept of AGI and emergent behaviors.
Nils John Nilsson, a founder of artificial intelligence as a research field, proposed a test for human-level AI focused on employment. Could the algorithm function as an accountant, a construction worker, or a marriage counselor? Ben Goertzel, founder of a company exploring decentralized AGI, floated the idea of an algorithm capable of behaving like a college student (minus the binge drinking). Can the AI gather data from its external environment and make the choices needed to graduate?
OpenAI offers little clarity on the concept. A blog post from CEO Sam Altman describes AGI as anything “generally smarter than humans.” By this vague measure, it would be difficult to determine whether it is ever really achieved.
Sure, GPT-4 can pass a bunch of standardized tests, but is it really “smarter” than humans if it can’t tell when the third letter in a word is “k”? While AI testing helps researchers gauge improvement, an ability to pass the bar exam does not mean an algorithm is now sentient. OpenAI’s definition of AGI also excludes the need for algorithms to interact with the physical world.
Would it be outrageous to slip a powerful chatbot inside of a humanoid robot and let it loose?
The chatbot-robot combo would not be able to achieve much independently, even with the best robots available today. What’s holding it back? A primary, limiting factor in the field of robotics is a lack of data. “We don’t have tons of robot data, unlike Wikipedia, for example, in the NLP realm,” says Chelsea Finn, an assistant professor at Stanford University who leads the Intelligence Through Robotic Interaction at Scale (IRIS) research lab and works with the Google Brain. The internet brims with text to improve chatbots; the data available for robotics is less comprehensive.
The physical world is complex to navigate, and robots succeed only at very narrowly defined tasks. A bot may be able to roam a construction site, but it might struggle to remove the lid from a container. Finn and members of her IRIS lab experiment with fascinating ways to make robots more generalized, helpful, and better at learning. “I view this very orthogonally to anything related to sentience,” she says. “I view it much more in terms of being able to do useful tasks.” Advanced robots are far from capable of interacting with Earth (or Mars) in a spontaneous way, let alone being capable of going full I, Robot.
“I have very mixed feelings when these companies are now talking about sentient AI and expressing concern,” says Suresh Venkatasubramanian, a professor at Brown University and coauthor of the Blueprint for an AI Bill of Rights. “Because I feel like they haven’t expressed concerns at all about real harms that have already manifested.” Futuristic fears can distract from the tangible present. A series of articles published in a collaboration between Lighthouse Reports and WIRED laid out how an algorithm used in the Netherlands was more likely to recommend single mothers and Arabic speakers be investigated for welfare fraud.
AI will continue to transform daily interactions between friends, coworkers, and complete strangers—for the better and for the worse. Whether an algorithm ever achieves a kind of consciousness may be beside the point. From Tamagotchi pets to Replika chatbots, humans have long formed one-sided, emotional bonds with technology. Gratitude may be warranted, though it is not yet reciprocated.