Researchers reveal real development costs of DeepSeek – They are staggering

DeepSeek’s $6 million development cost is a lie – the real costs are much higher

This week has been filled with conversations about DeepSeek V3, the Chinese AI model developed with training costs of just $6 million. This revelation caused Nvidia’s stock price to decrease dramatically. Many called DeepSeek’s development the death of the AI infrastructure manufacturing complex. Simply put, DeepSeek won’t be the death of Nvidia’s AI hardware domination, and neither is it concerning to the broader AI industry.

SemiAnalysis has examined the costs of developing DeekSeek. Simply put, the “6 million” myth stems from the marketing of the AI’s pre-training cost alone. This number spooked investors, primarily those with little knowledge of the semiconductor industry. While DeepSeek’s achievements are impressive, the only unexpected thing about it is that it came from China and not the US.

DeepSeek’s price and efficiencies caused the frenzy this week, with the main headline being the “$6M” dollar figure training cost of DeepSeek V3. This is wrong. This akin to pointing to a specific part of a bill of materials for a product and attributing it as the entire cost. The pre-training cost is a very narrow portion of the total cost.

– SemiAnalysis

SemiAnalysis estimates that the hardware spend of DeepSeek is “higher than $500M (million).” Add on the costs of research, testing, salaries, and hardware running costs, and DeepSeek’s development costs are likely well over $1 billion.

We believe the pre-training number is nowhere the actual amount spent on the model. We are confident their hardware spend is well higher than $500M over the company history. To develop new architecture innovations, during the model development, there is a considerable spend on testing new ideas, new architecture ideas, and ablations. Multi-Head Latent Attention, a key innovation of DeepSeek, took several months to develop and cost a whole team of manhours and GPU hours.

The $6M cost in the paper is attributed to just the GPU cost of the pre-training run, which is only a portion of the total cost of the model. Excluded are important pieces of the puzzle like R&D and TCO of the hardware itself. For reference, Claude 3.5 Sonnet cost $10s of millions to train, and if that was the total cost Anthropic needed, then they would not raise billions from Google and tens of billions from Amazon. It’s because they have to experiment, come up with new architectures, gather and clean data, pay employees, and much more.

– SemiAnalysis

DeepSeek V3 is impressive, but its advancements were expected

The AI market is relatively new. This leaves room for staggering levels of year-on-year advancement. Regardless of DeepSeek V3’s real costs, its low pre-training costs are innovative. However, it is worth remembering that this development wasn’t unexpected. Basically, the most shocking aspect of DeepSeek V3 is that this level of cost was achieved first in China and not in the Western world.

V3 is no doubt an impressive model, but it is worth highlighting impressive relative to what. Many have compared V3 to GPT-4o and highlight how V3 beats the performance of 4o. That is true but GPT-4o was released in May of 2024. AI moves quickly and May of 2024 is another lifetime ago in algorithmic improvements. Further we are not surprised to see less compute to achieve comparable or stronger capabilities after a given amount of time. Inference cost collapsing is a hallmark of AI improvement.

An example is small models that can be run on laptops have comparable performance to GPT-3, which required a supercomputer to train and multiple GPUs to inference. Put differently, algorithmic improvements allow for a smaller amount of compute to train and inference models of the same capability, and this pattern plays out over and over again. This time the world took notice because it was from a lab in China. But smaller models getting better is not new.

So far what we’ve witnessed with this pattern is that AI labs spend more in absolute dollars to get even more intelligence for their buck. Estimates put algorithmic progress at 4x per year, meaning that for every passing year, 4x less compute is needed to achieve the same capability. Dario, CEO of Anthropic argues that algorithmic advancements are even faster and can yield a 10x improvement. As far as inference pricing goes for GPT-3 quality, costs have fallen 1200x.

– SemiAnalysis

A major trend in AI is that inferencing costs are decreasing. Hardware and software are improving. By the end of this year, we will likely see costs plummet further. This will lead to AI being used in more places. It will also lead to AI being used for more things.

To be clear DeepSeek is unique in that they achieved this level of cost and capabilities first. They are unique in having released open weights, but prior Mistral and Llama models have done this in the past too. DeepSeek has achieved this level of cost but by the end of the year do not be shocked if costs fall another 5x.

– SemiAnalysis

The market impact of DeekSeek has been huge. It shatters the idea that the US alone stands as the world’s leader in AI research. It also shows how the clever marketing of certain statistics can cause serious drops in the share price of AI giants like Nvidia.

The innovations of DeepSeek’s AI models will be copied by Western labs quickly, and the race is on to see where the next major AI advancement will be. The AI market is moving rapidly. Now, the race is on to see who will be the next AI frontrunner. Will more Chinese firms step up? Can OpenAI and other US companies regain lost ground? Regardless of where, one thing is certain: AI will continue to advance.

You can join the discussion on the true cost of DeepSeek’s development on the OC3D Forums.

Mark Campbell

Mark Campbell

A Northern Irish father, husband, and techie that works to turn tea and coffee into articles when he isn’t painting his extensive minis collection or using things to make other things.

Follow Mark Campbell on Twitter
View more about me and my articles.