Richard Sutton, one of the pioneers behind modern artificial intelligence, is not convinced that simply throwing more computing power at AI will lead to machines that think like humans. In fact, he argues today’s obsession with scaling up deep learning might be holding AI back from its full potential.
Sutton, alongside his longtime collaborator Andrew Barto, won this year’s Turing Award—often called the "Nobel Prize of Computing"—for his work in reinforcement learning. He believes the real breakthrough will come when AI stops relying on curated datasets and starts learning from experience, much like a child does.
“If we want real intelligence, AI needs to learn by doing, by trial and error,” Sutton said in an interview. “Computation is not a panacea. More compute helps, but it's not the core ingredient of intelligence.”
It’s a bold claim at a time when AI giants like OpenAI, Google DeepMind and Anthropic are racing to scale their models, feeding them ever-increasing amounts of data and compute in pursuit of human-level reasoning. Sutton, however, believes this approach is flawed, arguing that true progress will come from refining the algorithms that govern how machines learn, not just making them bigger.
The reinforcement learning revolution
Sutton’s contributions to AI stretch back decades. Still, his most significant impact has been in reinforcement learning. This method enables AI to learn by interacting with its environment, much like how humans and animals learn through trial and error.
Reinforcement learning works by rewarding an AI system for correct actions and penalizing it for mistakes, similar to how a child learns that touching a hot stove is a bad idea but reaching for a toy is good. Over time, the AI system refines its decision-making process by maximizing rewards and minimizing errors.
This technique was famously used in AlphaGo, the AI system developed by Google DeepMind that shocked the world in 2016 by defeating world champion Go player Lee Sedol. The AI learned not by memorizing human strategies but by playing millions of games against itself, refining its strategy through reinforcement learning.
Since then, reinforcement learning has expanded beyond games into areas like robotics, financial trading and healthcare. It helps optimize self-driving cars, improve automated trading algorithms and even fine-tune AI chatbots like ChatGPT through reinforcement learning from human feedback (RLHF). RLHF allows AI models to refine their responses based on user interactions, making them more conversational and aligned with human expectations.
Despite these advancements, Sutton believes reinforcement learning has yet to be fully utilized. “It’s still early,” he said. “AI systems today mostly rely on pre-processed data, not real-world interactions. That needs to change if we want AI that truly understands and adapts.”
How close are we to human-level AI?
The idea of artificial general intelligence (AGI)—AI that can think, reason and learn across a wide range of tasks on par with a human—has long been a controversial topic. Some experts argue that AGI is many years down the road, while others believe it may never be possible. In another camp altogether, some experts assert that AGI isn’t the right goal to prioritize. “We should not forget the power of these models in other non-language domains,” said Marina Danilevsky, a Senior Research Scientist at IBM, on an episode of the Mixture of Experts podcast. “If we actually broaden where this technology could be used… we can go in places that are much more interesting, much more pragmatic, much more practical… [instead of] chasing AGI.”
Sutton takes a measured stance. He estimates a one-in-four chance that AI could reach human-level intelligence within five years and a 50% chance within 15 years. That’s a strikingly optimistic forecast compared to many of his peers, who often predict AGI is still several decades away.
“There are still breakthroughs needed,” he acknowledged. “But we’re getting closer. The biggest missing piece is how to make AI systems learn from experience in a more natural way, rather than being spoon-fed labeled datasets.”
As Sutton describes it, one of the biggest challenges is teaching AI to make sense of long-term planning and abstraction—the ability to break down complex problems into smaller, manageable pieces, the way humans do.
“If I tell you to walk across the street, you don’t think about every tiny muscle movement. You think about the goal: crossing the street. AI needs to learn like that, at a higher level of abstraction,” Sutton explained.
One of his key contributions to reinforcement learning is the concept of temporal abstraction, which allows AI to learn in steps rather than getting bogged down in micromanagement. This could be critical for AI systems that need to reason across long time horizons—something that today’s models struggle with.
For example, an AI assistant might be able to generate a response to a single question well but struggle with maintaining a logical conversation over multiple interactions or planning a complex task that unfolds over time—like booking a vacation that involves coordinating flights, hotels and activities. Sutton believes that reinforcement learning and better long-term reasoning algorithms will be key to overcoming this limitation.

The latest AI News + Insights
Discover expertly curated insights and news on AI, cloud and more in the weekly Think Newsletter.
AI as children: The future of coexistence
Sutton believes the best way to think about AI’s future is not as tools or slaves but as children—learning, evolving and eventually gaining independence.
“We don’t treat our children as machines that must be controlled,” he said. “We guide them, teach them, but ultimately, they grow into their own beings. AI will be no different.”
Sutton warns that treating AI as something to be dominated or enslaved could lead to adversarial relationships rather than cooperation. Instead, he argues that just as children learn the values of human society through observation and interaction, AI must be taught, not programmed, to align with human values.
“This isn’t about control; it’s about understanding,” he explained. “When you raise a child, you don’t just impose hard rules and expect obedience. You demonstrate kindness, fairness and cooperation, and the child internalizes those values. AI can learn the same way.”
The analogy raises profound questions. If AI becomes more autonomous, how will society integrate these digital beings? Will they have rights? Should they be given independence? Sutton suggests that the way we approach AI’s development now will define how these future relationships unfold.
“If we raise AI in an environment of trust and cooperation, they will learn to exist alongside us. If we treat them as adversaries, we risk creating systems that have every reason to resist us,” he said.
Sutton’s perspective challenges the conventional fear-based narratives about AI alignment, which often assume that advanced AI must be shackled to prevent it from harming humanity. Instead, he proposes an approach based on mutual benefit, where AI learns through experience rather than rigid constraints.
The future of AI: Learning like humans
Sutton’s vision for AI is ultimately about building machines that learn how humans do—through exploration, experience and adaptation. To him, the future of AI is not about bigger models or more rules but about making AI systems that can figure things out on their own.
His Turing Award prize money—USD 500,000 of the USD 1 million shared with Barto—is already being put to work toward that vision. He has established the Openmind Research Institute, aimed at giving young AI researchers the freedom to explore fundamental questions about learning, without the pressures of commercialization.
“When Andy Barto and I started out, we had the time and space to explore ideas freely,” he said. “That’s what led to reinforcement learning becoming what it is today. I want to give the next generation that same opportunity.”
So, is human-level AI inevitable? Sutton remains cautiously optimistic. “It’s not a question of if—it’s a question of when,” he said. “And when it happens, it won’t be because we built a bigger model. It will be because we built a smarter learner.”
Mixture of Experts | 7 March, episode 45
Decoding AI: Weekly News Roundup
Join our world-class panel of engineers, researchers, product leaders and more as they cut through the AI noise to bring you the latest in AI news and insights.
Watch the latest podcast episodes
Ebook How to choose the right foundation model
Learn how to choose the right approach in preparing datasets and employing foundation models.
Read the ebook