Prediction 6: Efforts to develop humanoid robots will attract considerable attention, funding and ... [+]Photo credit: Reuters
1) GPT-4 will be released in the next couple months—and yes, it will be a big deal.
Rumors have been flying recently about GPT-4, the next generation of OpenAI’s powerful generative language model.
Expect GPT-4 to be released early in the new year and to represent a dramatic step-change performance improvement relative to GPT-3 and 3.5. As manic as the recent hype around ChatGPT has been, it will be a mere prelude to the public reaction when GPT-4 is released. Buckle up.
What will GPT-4 be like? Perhaps counterintuitively, we predict that it won’t be much larger than its predecessor GPT-3. In an influential research paper published earlier this year, DeepMind researchers determined that today’s large language models are in fact larger than they should be; for optimal model performance (given a finite compute budget), today’s models should have fewer parameters but train on larger datasets. Training data, in other words, trumps model size.
Most of today’s leading language models were trained on data corpuses of about 300 billion tokens, including OpenAI’s GPT-3 (175 billion parameters in size), AI21 Labs’ Jurassic (178 billion parameters in size), and Microsoft/Nvidia’s Megatron-Turing (570 billion parameters in size).
We predict that GPT-4 will be trained on a dataset at least an order of magnitude larger than this—perhaps as large as 10 trillion tokens. Meanwhile, it will be smaller (i.e., fewer parameters) than Megatron-Turing.
It is possible that GPT-4 will be multimodal: that is, that it will be able to work with images, videos and other data modalities in addition to text. This would mean, for example, that it could take a text prompt as input and produce an image (like DALL-E does); or take a video as input and answer questions about it via text.
A multimodal GPT-4 would be a bombshell. More likely, however, GPT-4 will be a text-only model (like the previous GPT models) whose performance on language tasks will redefine the state of the art. What will this look like, specifically? Two language areas in which GPT-4 may demonstrate astonishing leaps in performance are memory (the ability to retain and refer back to information from previous conversations) and summarization (the ability to distill a large body of text to its essential elements).
2) We are going to start running out of data to train large language models.
It has become a cliché to say that data is the new oil. This analogy is fitting in one underappreciated way: both resources are finite and at risk of being exhausted. The area of AI for which this concern is most pressing is language models.
As we discussed in the previous section, research efforts like DeepMind’s Chinchilla work have highlighted that the most effective way to build more powerful large language models (LLMs) is not to make them larger but to train them on more data.
But how much more language data is there in the world? (More specifically, how much more language data is there that meets an acceptable quality threshold? Much of the text data on the internet is not useful to train an LLM on.)
This is a challenging question to answer with precision, but according to one research group, the world’s total stock of high-quality text data is between 4.6 trillion and 17.2 trillion tokens. This includes all the world’s books, all scientific papers, all news articles, all of Wikipedia, all publicly available code, and much of the rest of the internet, filtered for quality (e.g., webpages, blogs, social media). Another recent estimate puts the total figure at 3.2 trillion tokens.
DeepMind’s Chinchilla model was trained on 1.4 trillion tokens.
In other words, we may be well within one order of magnitude of exhausting the world’s entire supply of useful language training data. This could prove a meaningful impediment to continued progress in language AI. Privately, many leading AI researchers and entrepreneurs are worried about this.
Expect to see plenty of focus and activity in this area next year as LLM researchers seek to address the looming data shortage. One possible solution is synthetic data, though the details about how to operationalize this are far from clear. Another idea: systematically transcribing the spoken content of the world’s meetings (after all, spoken discussion represents vast troves of text data that today go uncaptured).
As the world’s leading LLM research organization, how OpenAI deals with this challenge in its soon-to-be-announced GPT-4 research will be fascinating and illuminating to see.
3) For the first time, some members of the general public will begin using fully driverless cars as their day-to-day mode of transportation.
After years of premature hype and unfulfilled promises in the field of autonomous vehicles, something has happened recently that surprisingly few people seem to have noticed: truly driverless cars have arrived.
Today, as a member of the general public, you can download the Cruise app (it looks just like the Uber or Lyft app) and hail a driverless vehicle—with no one behind the wheel—to take you from Point A to Point B on the streets of San Francisco.
Cruise currently only offers these driverless rides at night (between 10 pm and 5:30 am), but the company is poised to make the service available 24/7 throughout San Francisco. Expect this to happen within weeks. Cruise’s rival Waymo is close behind.
In 2023, robotaxi services will rapidly transition from a fascinating novelty to a viable, convenient—even mundane—way to get around the city. The number of robotaxis on the road and the number of people who use them will surge. In short, autonomous vehicles are about to enter their commercialization and scaling phase.
Rollout will happen on a city-by-city basis. Beyond San Francisco, expect fully driverless services to become available to the general public in at least two more U.S. cities next year. Plausible candidate locations include Phoenix, Austin, Las Vegas and Miami.
4) Midjourney will raise venture capital funding.
The three most prominent text-to-image AI platforms today are DALL-E from OpenAI, Stable Diffusion from Stability AI (and other contributors), and Midjourney.
OpenAI raised $1 billion from Microsoft in 2019 and is currently in talks to raise billions more. Stability AI raised $100 million a few months ago and is already seeking to raise more.
Midjourney, by contrast, has spurned all outside funding. The company’s usage and growth have been astonishing: as of this writing, it has nearly 6 million users and substantial revenues. Yet according to its website, Midjourney remains a “small self-funded” organization with only 11 full-time team members.
Midjourney’s founder and CEO David Holz was previously the cofounder and CTO at Leap Motion, a once-high-flying virtual reality startup that raised close to $100 million in venture funding during the 2010s before crashing back down to earth and getting acquired in a fire sale. Holz’s negative experiences with his VC investors during the Leap Motion saga have allegedly convinced him not to take outside capital this time around. The many VC suitors who have sought to invest in Midjourney have so far all been rebuffed.
Yet faced with the demands of blistering growth, intensifying competition, and a massive market opportunity, we predict Holz will give in and raise a large funding round for Midjourney in 2023. Otherwise, the company risks being left behind in the generative AI gold rush that it helped usher in.
5) Search will change more in 2023 than it has since Google went mainstream in the early 2000s.
Search is the primary means by which we navigate and access digital information. It lies at the heart of the modern internet experience.
Today’s large language models can read and write with a level of sophistication that a few years ago would have seemed inconceivable. This will have profound implications for how we search.
In the wake of ChatGPT, one reconceptualization of search that has gotten a lot of attention is the idea of conversational search. Why enter a query and get back a long list of links (the current Google experience) if you could instead have a dynamic conversation with an AI agent in order to find what you are looking for?
Conversational search has a bright future. One major challenge needs to be resolved, though, before it is ready for primetime: accuracy. Conversational LLMs are not reliably accurate; they occasionally share factually untrue information with total confidence. OpenAI CEO Sam Altman himself recently cautioned: “It’s a mistake to be relying on ChatGPT for anything important right now.” Most users will not accept a search application that is accurate 95% or even 99% of the time. Addressing this issue in a scalable and robust way will be one of the primary challenges facing search innovators in 2023.
You.com, Character.AI, Metaphor and Perplexity are among the wave of promising young startups looking to take on Google and reinvent consumer search with LLMs and conversational interfaces.
But consumer internet search is not the only type of search that LLMs will transform.
Enterprise search—the way that organizations search and retrieve private internal data—is likewise on the cusp of a new golden age. Thanks to large-scale vectorization, LLMs enable true semantic search for the first time: the ability to index and access information based on underlying concepts and context rather than simple keywords. This will make enterprise search vastly more powerful and productive.
Startups like Hebbia and Glean are leading the charge to transform enterprise search using large language models.
And the opportunities for next-generation search extend beyond text. Recent advances in AI open up whole new possibilities in multimodal search: that is, the ability to query and retrieve information across data modalities.
Given that it accounts for ~80% of all data on the internet, no modality represents a bigger opportunity than video. Imagine being able to search effortlessly and precisely for a particular moment, individual, concept or action within a video. Twelve Labs is one startup building a multimodal AI platform to enable nuanced video search and understanding.
Search has changed surprisingly little since Google’s ascendance during the dot-com era. Next year, thanks to large language models, this will begin to change dramatically.
6) Efforts to develop humanoid robots will attract considerable attention, funding and talent. Several new humanoid robot initiatives will launch.
The humanoid robot is perhaps the definitive symbol of Hollywood’s exaggerated, dramatized depiction of artificial intelligence (think Ex Machina or I, Robot).
Well, humanoid robots are fast becoming a reality.
Why build robots shaped like humans? For the simple reason that we have architected much of the physical world for humans. If we plan to use robots to automate complex activities in the world—in factories, shopping malls, offices, schools—the most effective approach is often for those robots to have the same form factor as the humans that would otherwise be completing those activities. This way, robots can be deployed in diverse settings with no need for the surrounding environment to be adapted.
Tesla has catalyzed the field of humanoid robotics this year with the launch of its Optimus robot, which debuted at the company’s AI Day in September. Elon Musk has said that he believes the Optimus robot will eventually be worth more to Tesla than its entire car business. Tesla’s robot still has a long way to go before it is ready for primetime—but don’t underestimate the rapid progress that the company is capable of when it devotes its full resources to the task.
A crop of promising startups is likewise moving the field of humanoid robotics forward, including Agility Robotics, Halodi Robotics, Sanctuary AI and Collaborative Robotics.
In 2023, expect more contenders to enter the fray—both new startups and established companies (e.g., Toyota, Samsung, General Motors, Panasonic)—as the race to build humanoid robots heats up. Similar to autonomous vehicles circa 2016, waves of talent and capital will start pouring into the field next year as more people come to appreciate the scale of the market opportunity.
7) The concept of “LLMOps” will emerge as a trendy new version of MLOps.
When a major new technology platform emerges, an associated need—and opportunity—arises to build tools and infrastructure to enable this new platform. Venture capitalists like to think of these supporting tools as “picks and shovels” (for the upcoming gold rush).
In recent years, machine learning tooling—widely referred to as MLOps—has been one of the startup world’s hottest categories. A wave of buzzy MLOps startups has raised large sums of capital at eye-watering valuations: Weights & Biases ($200 million raised at a $1 billion valuation), Tecton ($160 million raised), Snorkel ($138 million raised at a $1 billion valuation), OctoML ($133 million raised at a $850 million valuation), to name a few.
Now, we are witnessing the emergence of a new AI technology platform: large language models (LLMs). Compared to pre-LLM machine learning, large language models represent a new AI paradigm with distinct workflows, skillsets and possibilities. The easy availability of massive pretrained foundation models via API or open source completely changes what it looks like to develop an AI product. A new suite of tools and infrastructure is therefore destined to emerge.
We predict the term “LLMOps” will catch on as a shorthand to refer to this new breed of AI picks and shovels. Examples of new LLMOps offerings will include, for instance: tools for foundation model fine-tuning, no-code LLM deployment, GPU access and optimization, prompt experimentation, prompt chaining, and data synthesis and augmentation.
8) The number of research projects that build on or cite AlphaFold will surge.
DeepMind’s AlphaFold platform, first announced in late 2020, solved one of life’s great mysteries: the protein folding problem. AlphaFold is able to accurately predict the three-dimensional shape of a protein based solely on its one-dimensional amino acid sequence, a landmark achievement that had eluded human researchers for decades. (We have previously argued in this column that AlphaFold represents the single most important achievement in the history of artificial intelligence.)
Because proteins underpin nearly every important activity that happens inside every living being on earth, more deeply understanding their structures and functions opens up profound new possibilities in biology and human health: from developing life-saving therapeutics to improving agriculture, from fighting disease to investigating the origins of life.
In July 2021, DeepMind open-sourced AlphaFold and released a database of 350,000 three-dimensional protein structures. (As a reference point, the total number of protein structures known to mankind prior to AlphaFold was around 180,000.) Then, a few months ago, DeepMind publicly released the structures for another 200 million proteins—nearly all catalogued proteins known to science.
Mere months after DeepMind’s latest release, more than 500,000 researchers from 190 countries have used the AlphaFold platform to access 2 million different protein structures. This is just the beginning. Breakthroughs of AlphaFold’s magnitude require years for their full impact to manifest.
In 2023, expect the volume of research built on top of AlphaFold to surge. Researchers will take this vast new trove of foundational biological knowledge and apply it to produce world-changing applications across disciplines, from new vaccines to new types of plastics.
9) DeepMind, Google Brain, and/or OpenAI will undertake efforts to build a foundation model for robotics.
The term “foundation model,” introduced last year by a team of Stanford researchers, refers to a massive AI model trained on broad swaths of data that, rather than being built for a specific task, can perform effectively on a wide range of different activities.
Foundation models have been a key driver of recent progress in AI. Today’s foundation models are breathtakingly powerful. But—whether they are text-generating models like GPT-3, or text-to-image models like Stable Diffusion, or models for computer actions like Adept—they operate exclusively in the digital realm.
AI systems that act in the real world—e.g., autonomous vehicles, warehouse robots, drones, humanoid robots—have so far remained mostly untouched by the new foundation model paradigm.
This will change in 2023. Expect early pioneering work on this concept of foundation models for robotics to come from the world’s leading AI research organizations: DeepMind, Google Brain or perhaps OpenAI (though the latter took a step back from robotics research last year).
What would it mean to build a foundation model for robotics—in other words, a foundation model for the physical world? At a high level, such a model might be trained on troves of data from different sensor modalities (e.g., camera, radar, lidar) in order to develop a generalized understanding of physics and real-world objects: how different objects move, how they interact with one another, how heavy or fragile or soft or flexible they are, what happens when you touch or drop or throw them. This “real-world foundation model” could then be fine-tuned for particular hardware platforms and particular downstream activities.
10) Many billions of dollars of new investment commitments will be announced to build chip manufacturing facilities in the United States as the U.S. makes contingency plans for Taiwan.
Artificial intelligence, like human intelligence, depends upon both software and hardware. Certain types of advanced semiconductors are essential to power modern AI. By far the most important and widespread of these are Nvidia’s GPUs; players like AMD, Intel and a handful of younger AI chip upstarts are also seeking to enter the market.
Nearly all of these AI chips are designed in the United States. And nearly all of them are manufactured in Taiwan. One company—the Taiwan Semiconductor Manufacturing Company (TSMC)—produces most of the world’s advanced chips, including Nvidia’s highly coveted GPUs.
Tensions between China and Taiwan have escalated dangerously over the past year. Many observers now believe it is likely or even inevitable that China will invade and reabsorb Taiwan sometime in the next few years.
This represents a major strategic dilemma for the United States, the technology world, and the field of AI.
In an effort to mitigate this precarious AI hardware bottleneck and reduce its reliance on Taiwan, in 2023 the U.S. government will massively incentivize and subsidize the construction of advanced chip manufacturing facilities on American soil. The CHIPS and Science Act, passed into law this summer, provides legislative impetus and budgetary resources for this.
This process is already underway. Two weeks ago, TSMC announced it would invest $40 billion to build two new chip manufacturing plants in Arizona. (President Biden visited the Arizona site in person to hail the announcement.) Importantly, the new TSMC plants—slated to begin production by 2026—will be capable of producing 3 nanometer chips, the most advanced semiconductors in the world today.
Expect to see more such commitments in 2023 as the U.S. seeks to derisk the global supply base for critical AI hardware.
Note: The author is a Partner at Radical Ventures, which is an investor in Hebbia, Twelve Labs and You.com.
See here for our 2022 AI predictions (and see here for our postmortem on them.)
See here for our 2021 AI predictions (and see here for our postmortem on them.)