The Whale in the Room: Three Insights on DeepSeek – Reinventing

You win some. You lose some.

But when you lose half a trillion dollars in a single day, it’s hard to be philosophical. Ask Jensen Huang, the founder and CEO of NVIDIA, whose stock price fell off a cliff on Monday, January 27, 2025, taking the stock price of the entire alphabet of tech firms with it.

What triggered this single-day, single-company record-setting loss on Wall Street? The launch, a week earlier, of an AI chatbot from China called DeepSeek-V3. According to a research paper released by DeepSeek, the performance results of this little known AI model were shocking to the unprepared West.

DeepSeek’s chatbot is at comparable levels with proprietary AI models like Open AI’s GPT4o and Anthropic’s Claude-Sonnet-3.5, and even outperforms all open-source models, including Meta’s Llama-3.1-405B model in benchmarks measuring diversity and depth of knowledge. And in terms of coding, match and reasoning benchmarks, DeepSeek’s chatbot is on par with all major models.

DeepSeek accomplished this spending only $5.6 million on the training phase. It cost Meta an estimated $100 million to train Llama 3.1. Here’s how Peter Diamandis describes this David vs Goliath tale of the tape in his January 28 newsletter:

OpenAI was founded 10 years ago, has around 4,500 employees, and has raised $6.6 billion in capital. DeepSeek was founded less than 2 years ago, has 200 employees, and was developed for roughly $5 million. While tech giants like OpenAI and Anthropic have been spending $100M+ just to train their AI models, this small 200-person team out of China built an AI system matching GPT-4’s performance for 20x less money.

Here are a few insights to be noted from this mini-sputnik moment:

The performance gap between open and closed models is closing
Necessity is the mother of invention
Disruption can come from anywhere, so beware and be bold

The Gap is Closing

The race for AGI and ASI is often framed geopolitically – a battle for AI supremacy between the US and China. But DeepSeek’s splash onto the global arena shows that it may really be a battle between closed and open-source models. By developing an open-sourced model, DeepSeek saves on licensing fees and leverages the community to debug and maintain the model, as well as develop new tools for it. And they provide this model to anyone for free.

Meta does the same, but its Llama models have generally lagged in performance compared to the bright and shiny models of OpenAI and Anthropic. But now that DeepSeek has shown open-source can be as good as the best of the proprietary models, the perceived lead of the more expensive closed models is under threat.

@gianluca.mauro Chinese AI tanks the US stock market #Ai #learnontiktok #artificialintelligence #business #machinelearning #product #ux #entrepreneur ♬ original sound – Gianluca Mauro

Ma Necessity Gives Birth to Baby Innovation

In October of 2022, the American government, in essence, restricted the sale and distribution of advanced computing chips like NVIDIA’s GPUs “destined for a supercomputer or semiconductor development or production end use in the PRC.”

But as leading AI technologist and investor from China, Kai-fu Li said, not only is it really difficult for Chinese companies to get advanced chips, it’s too expensive for most organizations or institutions in China to purchase them anyway.

“We couldn’t afford 10,000 GPUs,” he explained in this interview about his own company’s efforts in building leading-edge AI models. “Basically, we did production runs on only 2,000 GPUs, which is a small fraction of what the US companies are using. Elon Musk just put together 100,000 H100s, and Open AI even more. We have basically less than 2% of their compute. But I am a deep believer in efficiency, power of engineering, small teams working together, vertical integration. I’m strong believer that necessity is the mother of innovation.”

To me, it feels like we’re witnessing a stark comparison between a 3-star Michelin chef with a professional team and a fully stocked pantry (frontier AI companies), and a sole street vendor, with limited ingredients (DeepSeek). The chef asks the team to experiment with exotic flavors and lavish techniques, perhaps even discarding dishes that don’t meet their standards. The street vendor has to be crafty and resourceful, maximizing flavors and minimizing waste to create something tasty. This necessity to “do more with less” can lead to new and surprising meals.

Disruption Can Come From Anywhere

The name on everyone’s lips is Liang Wenfeng, the founder of DeepSeek. He is a 39 year old native of Guangdong Province in southern China who has a master in Information and Communication Engineering, and founded a quant hedge fund called High-Flyer in 2016. His goal early on was to leverage artificial intelligence to enhance trading performance, and very wisely stockpiled NVIDIA chips, before the ban on China was established.

According to this source, “fewer than five companies in (China) owned over 10,000 GPUs, apart from major tech giants. One of them was High-Flyer.” The New York Times explained that DeepSeek used NVIDIA’s H800 chips to train its most recent model. The H800s are not restricted under the chip export ban, as it is a watered-down version of NVIDIA’s H100, specifically marketed to the Chinese market. The Times went on to explain that the Biden administration wasn’t happy with NVIDIA’s technically legal action, so it banned the H800 too, but not quickly enough to stop DeepSeek and other Chinese companies from snapping them up.

Somehow, despite having the computational fire-power to develop cutting-edge AI models, DeepSeek flew under the radar, even to the Chinese media, based on this July, 2024 article published by 量子位 (QbitAI).

Among China’s seven large model startup companies, DeepSeek (深度求索) is the most low-profile, but it always manages to be remembered in unexpected ways.

This article was written after DeepSeek had released an earlier version of its LLM called V2, and it explained how that version caught the surprised eye of OpenAI.

In Silicon Valley, DeepSeek is referred to as a “mysterious force from the East.” SemiAnalysis’s chief analyst believes that the DeepSeek V2 paper “might be the best one this year.” Former OpenAI employee Andrew Carr thinks the paper is “full of astonishing wisdom” and has applied its training settings to his own model. Former OpenAI policy director and Anthropic co-founder Jack Clark believes DeepSeek “has hired a group of inscrutable geniuses” and thinks that large models made in China “will become an unignorable force, just like drones and electric vehicles.

The surprise by the West, particularly the US is based on the premise that China has progressed by copying innovation, not sparking it. Liang Wenfeng said that perception is no longer valid.

“What surprised them was that a Chinese company was participating as an innovator in their game. After all, most Chinese companies are used to following rather than innovating.”

Sam’s Subtle Rebuttal

On January 28, a week after DeepSeekv3 turned the AI world upside down, OpenAI founder and CEO responded on X, saying “deepseek’s r1 is an impressive model, particularly around what they’re able to deliver for the price.”

The tech world, including OpenAI are amazed at DeepSeek’s innovation, and Altman said his competitive spirit has been sparked. “We will obviously deliver much better models and also it’s legit invigorating to have a new competitor!”

But Altman believes the long game towards AGI and ASI requires massive technological firepower and expense, and that advancements in AI have just begun.

“Mostly we are excited to continue to execute on our research roadmap and believe more compute is more important now than ever before to succeed at our mission. the world is going to want to use a LOT of ai, and really be quite amazed by the next gen models coming.”

The Whale in the Room: Three Insights on DeepSeek – Reinventing

Related