Image source
Gartner’s AI Hype Cycle is Way Passed its Due Date — And Are We Entering a Classical ML Winter?
When looking at Gartner’s 2023 Hype Cycle for Artificial Intelligence one can only come to one conclusion: the hype cycle itself has reached its ‘Peak of inflated expectations’ and is way past its due date.
·
11 min read
·
Sep 6
Those who know me also know that I’m NOT a big fan of Gartner Hype Cycles. Or Gartner Magic Quadrants. Or Forrester Waves for that matter…
The reality is that the ‘technologies’ Gartner tracks miss a lot of the foundational, transformative ones. What Gartner tracks, maps to the vendor technology ecosystem and solutions that IT organizations buy.
As noted by Tom Goodwin, almost every single truly transformative technology wasn’t ever tracked by the Hype Cycle. Here are a few examples: WiFi, the smartphone, GPS chips, the App Store, Web 2.0, APIs, etc.
Almost no technologies really follow the Hype Cycle path.
There’s a passage in an Ernst Hemingway novel in which a character named Mike is asked how he went bankrupt. “Two ways,” he answers. “Gradually, then suddenly.” Technological change happens much the same way. It’s shaped NOTHING like a hype curve. That something is hyped actually means nothing.
The technologies tracked by the Hype Cycle tend to reflect the silos that are IT today, and the constituencies that Gartner serves. Largely because of this, Gartner’s coverage misses the forests through the trees.
If we look below at the latest AI Hype Cycle by Gartner, concluding that we’re e.g. at the peak of inflated expectations for Generative AI and Foundation Models doesn’t require much. Actually, it doesn’t require anything else than looking at e.g. NVIDIA’s stock price.
Gartner’s AI Hype Cycle (2023 edition)
I have tons of respect for NVIDIA and Jensen Huang, but as recently noted by the ‘Dean of Valuations’, — Aswath Damodaran — even if looking at the most generous AI chip market size estimations for 2033 (which would be $350B from today’s $25B) and giving NVIDIA a 100% market share, he still couldn’t get up to 400$ share price for NVIDIA in ten years (the share price today is 485$).
So in order for it to make sense to buy NVIDIA shares, you’d have to price in another market that is as big as the AI market, that NVIDIA is able to jump into.
It doesn’t take a lot to infer via this that we’re at the ‘Peak of inflated expectations’ when it comes to Generative AI and LLMs (nor does the analysis feel very intellectually stimulating IMHO).
Are we in the middle of a ‘Classical ML’ Winter?
One thing when it comes to the AI Hype Cycle, that Gartner (naturally) doesn’t discuss at all, is that we’re in a “Classical ML” (non-LLM / Generative AI) winter. As highlighted by Alberto Romero García, ChatGPT is the best AND worst thing to ever happen to AI. It pushed AI/ML across the chasm for the masses, gave a deeper understanding of the underlying tech and its strengths & weaknesses, access to models, competition, etc.: https://lnkd.in/dYWPBeDP
However, when people talk about the current “unprecedented pace of AI progress” they conflate what happens at production level with R&D, which has almost stopped.
As Google’s François Chollet (who can hardly be accused of anti-hype) puts it: “It’s fascinating how the limitations of deep learning have stayed the same since 2017. Same problems, same failure modes, no progress.”
Let’s keep in mind that not everything is about Generative AI or LLMs. To quote Galina: “When one looks at LinkedIn, one gets the sense the hype cycle is dominated by LLMs currently”.
The image below is from a recent presentation by Andrew Ng (someone who doesn’t hype AI, but has a data-centric pragmatic approach to it).
Image source Stanford/Andrew Ng: Opportunities in AI — 2023
The dark inner circles represent ‘today’, whereas the lighter outer circles around represent his predictions for ‘in 3 years’. While we’re in the middle of the Generative AI & LLM hype, let’s not forget good old Machine Learning.
In 2018 I was involved in a side project where we benchmarked the performance of XGBoost vs Deep Neural Networks for predictive Machine Learning use cases with tabular data. Back then XGBoost performed better than any Deep Learning method with higher efficiency.
A paper from July ’22, which looked at 45 mid-sized datasets, found that tree-based models (XGBoost & Random Forests) still outperformed deep neural networks on tabular datasets. Find the paper here.
A study from June ’21 concluded that while significant progress has been made using Deep Learning models for tabular data, they still do not outperform XGBoost, and further research is warranted. In many cases, the Deep Learning models perform worse on unseen datasets. The XGBoost model generally outperformed the deep models.
Despite product and business model innovation in Generative AI, a big chunk of real-world ROI still remains steadily concentrated around predictive AI/ML, often utilizing tabular datasets and tree-based methods such as XGBoost or Random Forest. You can read more about these techniques e.g. here.
The truth is that not everything is about generating text or images. The actual truth is that a lot of high-value AI/ML will still be predictive utilizing tabular datasets. As noted by Tommi, the majority of business use cases out there for AI/ML are done with proprietary tabular business data.
Alexander Ratner predicted at the beginning of this year that the gap between generative and predictive AI/ML will widen. As highlighted by Alexander, reaching the performance needed to deploy predictive AI/ML requires labeling (and re-labeling) training data for each task and setting. Building predictive AI/ML models on top of Generative AI/Foundation Models will help, but not solve this — foundations are just foundations: https://lnkd.in/db5Aq9WB
The ‘Data Gap’ by Alexander Ratner
Because of this data gap, Alexander predicted that predictive AI/ML will seem stuck while generative AI accelerates in 2023. Most high-value AI will still be predictive — so there will be significant frustrations around AI/ML ROI.
It’s all about data (and knowledge)
Focus on your data quality, that’s where you gain the most significant advantage. This is also the truth when it comes to Foundation Models and especially LLMs — not only ‘Classical ML’.
Andrew Ng started advocating in early 2021 about data-centric AI — but it has been a well-known secret for a long time that data, not the model per se, is the secret sauce when it comes to AI/ML — including LLMs.
Let me tell you a fun little story about Large Language Models dating back to 2007.
16 years ago in 2007, Google researchers published a paper on a class of statistical language models they dubbed ‘Large Language Models’ which they reported as achieving a new state of the art in performance. They used a very standard model and a decoding algorithm so simple they named it ‘Stupid Backoff’: https://lnkd.in/dnspsx2B
The key differentiator here? They trained it on 100X the amount of data.
Better data almost always has a greater impact than fancier models or algorithms in AI/ML… And still, data development has always been — relatively speaking — undersupported. Google actually published in 2021 the research paper ‘Everyone wants to do the model work, not the data work’.
Google’s team concluded in the paper that paradoxically, data is the most undervalued and deglamorized aspect of ML — viewed as ‘operational’ relative to the lionized work of building novel models and algorithms. AI/ML developers and engineers understand that data quality matters, often spending inordinate amounts of time on data tasks BUT still most organizations fail to create or meet any data quality standards, from under-valuing data work vis-a-vis model development.
I wrote in a Medium post in early ’21 (that you can find here) the following:
“As we’ve started to distance ourselves from the AI hype that reached its peak at the end of the 2010s, modern companies are realizing that their ML models were never their IP; it’s their data and subsequently how they ensure the quality of the data continuously and in real-time. One of the best-kept ‘secrets’ to better model performance is high-quality data.”
This still holds very true for today’s Foundation Models and LLMs. As highlighted by e.g. Chad, LLMs will not scale without investment in foundational data architecture and quality. No model can magically overcome years of tech debt in its training data.
Putting ChatGPT on top of a data swamp is a terrifying idea. LLMs can’t magically derive context from nothing.
Data, documentation and knowledge bases will be the key moat and competitive advantage in this new era. Knowledge bases are as important to AI progress as Foundation Models and LLMs.
The following dynamics (among many others) are playing out in the LLM market right now: the durable moat is data and the last mile generates the real value.
To compete, make sure your documentation and knowledge bases are the best on the planet. When it comes to knowledge you want to be able to store a lot of it, and you want to be able to find the right piece of knowledge at the right time. For e.g. LLMs this is typically done with a vector database.
We’re still early
First of all, I don’t view Foundation Models/LLMs vs ‘Classical ML’ as a binary subject. Most companies use both approaches and architectural options successfully depending on the use cases.
But what is binary is that we’re still in the early stages of leveraging LLMs effectively — albeit the underlying Transfomer architecture has already existed for almost 6 years.
As noted by Raphaël Mansuy, thoughtful systems design and architecture is crucial to build robust, safe and scalable LLM applications.
Emerging Architectures for LLM Applications by a16z
As mentioned in this blog post by Unusual Ventures, part of the process of building LLM-based applications involves feeding the language model relevant “context” (i.e., related documents, the results of a Google search). This context is particularly relevant to stop LLMs from “hallucinating” — by providing good context (i.e. data and knowledge), the model will extract the correct information from documents instead of making it up. This essentially boils down to giving LLMs “memory” — something that current generations of models do not have by default.
One of the most effective ways to find relevant context is to look up documents that are semantically similar to the task you’re looking to solve. An important capability of LLMs are their embedding capabilities — dense representations of language that contain semantic information. As mentioned earlier, vector databases have emerged (for now) as the most obvious and performant way to retrieve “similar” documents by enabling similarity searches on LLM embeddings.
But we’re in the early days of experimentation with LLMs. And getting started with LLMs is to some extent easier than getting started with the last wave of ML (this given that you don’t build your own multi-billion or trillion-parameter language model from scratch).
Image by Unusual Ventures
So what’s the easiest part when building LLM applications? More often than not, the easiest part is the LLM itself.
LLMs have brought thousands of software engineers into the AI/ML world. A very interesting trend — which is very much correlated with the exploding popularity of Foundation Models and LLMs — is the rise of software engineers as an emerging persona within AI/ML. This is mainly because Foundation Models lower the barrier to building ML models and are a higher level of abstraction. Subsequently, we’re seeing for the first time software developers becoming part of the ML development process.
Here’s a fun and interesting article by Joseph Gonzalez and Vikram Seekranti about the LLM stack and why it’s so difficult to build LLM applications (hint: it’s not the LLM).
Good data engineering and software engineering practices are still highly relevant to delivering cost-efficient and reliable services at scale — beyond some prototypes running on a laptop. And this is very much true for LLM-based (or any Generative AI applications) for that matter!
The image below is from this article by Crystal Liu, explaining the LLM training process, from pre-training, to fine-tuning, and RLHF, as well as LLMOps tools (model-focused only):
Image source Crystal Liu
BUT most importantly: LLMs or Generative AI for businesses will not work unless the underlying data it leverages is trustworthy.
Final thoughts
Eric Schwartz wrote recently about an IBM survey of CEOs that found two-thirds were getting pressure from their boards to adopt generative AI:
Image source
Nearly as many were feeling similar pressure from investors and nearly half said customers were asking them about their plans.
A lot of this comes from good old FoMo, where e.g. investors and board members are not thinking at all about e.g. data foundations, data quality, data engineering best practices, scalable data infra. They just want to jump on the train for the sake of it and mention it in a press release to boost market confidence that their ‘on top of it’. This is even when from a short-term value-add perspective the organizations they represent or have invested in should focus more on data engineering best practices and utilize *Classical ML* for tabular datasets they already possess. Not drop everything else and just focus on Generative AI and LLMs. Once again, it’s not binary, you can do both.
If all you have is a hammer, everything looks like a nail — something that is clearly visible with Generative AI right now.
At the end of the day, focus on the data and your AI/ML projects will thrive, including your LLM projects. According to the latest Gartner AI Hype Cycle, Data-Centric AI has had its ‘Innovation Trigger’. I can say this with almost a decade of experience in the AI/ML industry: it has always been a fact, but as ML models have become increasingly commoditized (including Foundation Models and LLMs) — data-centric AI has maybe had a ‘Reality Trigger’?
The hard part when building or implementing LLM-based applications is not the actual LLM, it’s the data and ‘everything else’, e.g. deploying it to the cloud, accessing the right data, tracing inputs and outputs, etc! And in many use cases — *Classical ML* is still the way to go, especially for predictive use cases with tabular data.
Keep this in mind the next time you plan your AI/ML strategy.
When it comes to the Hype Cycle, let’s just accept that all models are wrong (or maybe the right wording is that ‘none is perfect’) and none can model the future perfectly. However, some are helpful, but that can’t be said about the Gartner Hype Cycle. It’s just a really useless, wrong and a unhelpful representation, which plays no relevant role in the modern technology landscape.
The best way to evaluate a technology is to think about the problems it solves, and the feasibility of its implementation and the scale of its impact.
Not the hype or any cycles of it.