DeepSeek Explained: The Efficiency of the Chinese Model Is Spooking the Markets

China’s DeepSeek model challenges US AI firms with economic and efficient performance.
DeepSeek’s model is 20-40 times cheaper than OpenAI’s, using modest hardware.
DeepSeek’s effectiveness raises questions about US investment in AI infrastructure.

The bombshell that is China’s DeepSeek model has ignited the AI ecosystem.

The models are high-performance, relatively cheap and computationally efficient, which has led many to think they pose an existential threat to US companies like OpenAI and Meta – and the trillions of dollars that go into building, improving and scaling them. of AI in the US infrastructure.

DeepSeek’s open source model is priced competitively – 20 to 40 times cheaper to run than comparable models from OpenAI, according to Bernstein analysts.

But the potentially most unnerving element in the DeepSeek equation for US-built models is the relatively modest stack of hardware used to build them.

The DeepSeek-V3 model, which is most comparable to OpenAI’s ChatGPT, was trained on a set of 2,048 Nvidia H800 GPUs, according to the technical report published by the company.

The H800s are the first version of the company’s flagship chip for the Chinese market. After the regulations were changed, the company created another beater chip, the H20 to comply with the changes.

Although this may not always be the case, the chip is the biggest cost in the large language model training equation. Being forced to use less powerful and cheaper chips creates a limitation that the DeepSeek team has apparently overcome.

“Innovation under constraints requires genius,” Sri Ambati, CEO of open source AI platform H2O.ai told Business Insider.

Even on low-end hardware, training DeepSeek-V3 took less than two months, according to the report.

The efficiency advantage

DeepSeek-V3 it is small compared to its capabilities and has 671 billion parameters, while ChatGpt-4 has 1.76 trillion, which makes it easier to run. But it still hits impressive standards of comprehension.

Its smaller size comes in part from the different architecture in ChatGPT called an “expert mix”. The model has built-in pockets of expertise that spring into action when called upon and lie dormant when unrelated to the question. This type of model is growing in popularity, and DeepSeek’s advantage is that it built an extremely efficient version of an inherently efficient architecture.

“Someone made this analogy: It’s almost like someone released a $20 iPhone,” Foundry CEO Jared Quincy Davis told BI.

The Chinese model used a fraction of the time, a fraction of the chip count, and a less capable and less expensive chipset. Basically, it’s a drastically cheaper and competitively capable model that the firm is practically giving away for free.

The model that is even more worrisome from a competitive perspective, according to Bernstein, is DeepSeek-R1, which is a reasoning model and more comparable to OpenAI’s o1 or o3. This model uses reasoning techniques to question one’s own responses and thinking. The result is competitive with the latest OpenAI reasoning models.

The R1 was built on top of the V3, and the research document released alongside the top-of-the-line model doesn’t include information about the hardware stack behind it. But DeepSeek used strategies like generating its own training data to train R1, which requires more computation than using data aggregated for the Internet or generated by humans.

This technique is often called “distillation” and is becoming a standard practice, Ambati said.

However, distillation brings with it another layer of controversy. A company using its own models to distill a smarter, smaller model is one thing. But the legality of using other companies’ designs to distill new ones depends on licensing.

Still, DeepSeek’s techniques are more iterative and likely to be readily picked up by AI.

For years, model developers and startups have focused on smaller models as their size makes them cheaper to build and operate. The idea was that small models would serve specific tasks. But what DeepSeek and potentially OpenAI’s O3 mini demonstrate is that small models can also be generalists.

The game is not over

A coalition of players including Oracle and OpenAI, with cooperation from the White House, announced Stargate, a $500 billion data center project in Texas — the latest in a long and rapid procession of large-scale conversion to computing. accelerated. The hit from DeepSeek has called that investment into question, and the biggest beneficiary, Nvidia, is on a roller coaster as a result. The company’s shares fell more than 13% on Monday.

But Bernstein said the answer is out of step with reality.

“DeepSeek did NOT ‘build OpenAI for $5 million,'” Bernstein analysts wrote in an investor note on Monday. The panic, especially in “X” is blown out of proportion, analysts wrote.

DeepSeek’s own research paper on V3 explains: “the aforementioned costs only include formal DeepSeek-V3 training, excluding costs associated with preliminary research and ablation experiments on architectures, algorithms or data.” So the $5 million figure is only March of the equation.

“The models look fantastic, but we don’t think they’re miracles,” Bernstein continued. Last week China also announced an investment of roughly $140 billion in data centers, in a sign that infrastructure is still needed despite DeepSeek’s achievements.

The competition for model supremacy is fierce, and OpenAI’s moat may indeed be in question. But demand for chips shows no signs of slowing, according to Bernstein. Tech leaders are turning to an age-old economic adage to explain the moment.

Jevon’s paradox is the idea that innovation begets demand. As technology becomes cheaper or more efficient, demand rises much faster than prices fall. This is what computing power providers like Davis have been championing for years. This week, Bernstein and Microsoft CEO Satya Nadella also took up the mantle.

“Jevon’s Paradox Strikes Again!” Nadella posted on Monday X morning. “As artificial intelligence becomes more efficient and accessible, we will see its use increase, turning it into a commodity we simply cannot get enough of,” he continued.

The efficiency advantage

Similar stories

The game is not over