Two bubbles of air, blue and red, forming two halves of a sphere
Stanislaw Pytel//Getty Images
- A language model AI created proteins as good as ones honed over a million years of evolution.
- Salesforce’s ProGen designed sequences based on the “sentences” of biological proteins.
- Scientists are investigating whether the AI could identify treatment for disorders like rheumatoid arthritis and multiple sclerosis.
Artificial intelligence is a master of imitation. Every time scientists design an AI—whether to mimic human language or master a game like chess—it either matches or far exceeds the capabilities of its biological creators. Now, AI has proven that it can even master the art of biology itself.
Researchers at the University of California-San Francisco, the University of California-Berkeley, and Salesforce Research, a science arm of the SF-based software company, developed an AI capable of copying evolution itself. This doesn’t mean the AI created some sort of evolutionary superior superhuman (yet), but instead, the AI designed sequences of 20 amino acids that make up proteins. When compared to nature’s handiwork, some of the sequences worked just as well as ones generated over millions of years of evolution. The researchers published their findings in the journal Nature Biotechnology.
Interestingly, scientists didn’t design an AI from scratch, but rather, repurposed one from an unlikely field: a language model. Researchers used Salesforce’s ProGen natural language-processing abilities and focused on the “sentences” of biological proteins—essentially a language of amino acids.
Video player poster image
Related Stories
“In the same way that words are strung together one-by-one to form text sentences, amino acids are strung together one-by-one to make proteins,” Nikhil Naik, the Director of AI Research at Salesforce Research, told Motherboard. “Building on this insight, we apply neural language modeling to proteins for generating realistic, yet novel protein sequences.”
After training ProGen on 280 million proteins, the AI was “iteratively optimized by learning to predict the probability of the next amino acid given the past amino acids in a raw sequence,” according to the paper. The team eventually focused on five specific artificial proteins and compared them to an enzyme found in chicken eggs called “hen egg white lysozyme”—two of the AI-generated proteins compared favorably.
Overall, Salesforce estimates that 73 percent of ProGen’s proteins could function, compared to 59% of natural proteins, and found that the AI was also able to detect evolutionary patterns (though it wasn’t specifically designed to). AI has designed human proteins before, but this is the first time a language model AI was able to pull off the feat.
But the team isn’t interested in just answering the question of whether language model AIs can design proteins. Because proteins lie at the foundations of many illnesses, The Salesforce AI Research team is already investigating how ProGen could identify treatment for disorders like rheumatoid arthritis and multiple sclerosis.
So while some AIs are trained to beat humans at their own game (literally), language models like ProGen could one day put one over on evolution itself and help humans combat some of the world’s most debilitating health problems.