Good results with fewer resources is DeepSeek's secret sauce

Technology / analysis

DeepSeek: how a small Chinese AI company is shaking up US tech heavyweights

29th Jan 25, 10:45am by Tongliang Liu

Mojahid Mottakin/Shutterstock

By Tongliang Liu*

Chinese artificial intelligence (AI) company DeepSeek has sent shockwaves through the tech community, with the release of extremely efficient AI models that can compete with cutting-edge products from US companies such as OpenAI and Anthropic.

Founded in 2023, DeepSeek has achieved its results with a fraction of the cash and computing power of its competitors.

DeepSeek’s “reasoning” R1 model, released last week, provoked excitement among researchers, shock among investors, and responses from AI heavyweights. The company followed up on January 28 with a model that can work with images as well as text.

So what has DeepSeek done, and how did it do it?

What DeepSeek did

In December, DeepSeek released its V3 model. This is a very powerful “standard” large language model that performs at a similar level to OpenAI’s GPT-4o and Anthropic’s Claude 3.5.

While these models are prone to errors and sometimes make up their own facts, they can carry out tasks such as answering questions, writing essays and generating computer code. On some tests of problem-solving and mathematical reasoning, they score better than the average human.

V3 was trained at a reported cost of about US$5.58 million. This is dramatically cheaper than GPT-4, for example, which cost more than US$100 million to develop.

DeepSeek also claims to have trained V3 using around 2000 specialised computer chips, specifically H800 GPUs made by NVIDIA. This is again far fewer than other companies, which may have used up to 16,000 of the more powerful H100 chips.

On January 20, DeepSeek released another model, called R1. This is a so-called “reasoning” model, which tries to work through complex problems step by step. These models seem to be better at many tasks that require context and have multiple interrelated parts, such as reading comprehension and strategic planning.

The R1 model is a tweaked version of V3, modified with a technique called reinforcement learning. R1 appears to work at a similar level to OpenAI’s o1, released last year.

DeepSeek also used the same technique to make “reasoning” versions of small open-source models that can run on home computers.

This release has sparked a huge surge of interest in DeepSeek, driving up the popularity of its V3-powered chatbot app and triggering a massive price crash in tech stocks as investors re-evaluate the AI industry. At the time of writing, chipmaker NVIDIA has lost around US$600 billion in value.

How DeepSeek did it

DeepSeek’s breakthroughs have been in achieving greater efficiency: getting good results with fewer resources. In particular, DeepSeek’s developers have pioneered two techniques that may be adopted by AI researchers more broadly.

The first has to do with a mathematical idea called “sparsity”. AI models have a lot of parameters that determine their responses to inputs (V3 has around 671 billion), but only a small fraction of these parameters is used for any given input.

However, predicting which parameters will be needed isn’t easy. DeepSeek used a new technique to do this, and then trained only those parameters. As a result, its models needed far less training than a conventional approach.

The other trick has to do with how V3 stores information in computer memory. DeepSeek has found a clever way to compress the relevant data, so it is easier to store and access quickly.

What it means

DeepSeek’s models and techniques have been released under the free MIT License, which means anyone can download and modify them.

While this may be bad news for some AI companies – whose profits might be eroded by the existence of freely available, powerful models – it is great news for the broader AI research community.

At present, a lot of AI research requires access to enormous amounts of computing resources. Researchers like myself who are based at universities (or anywhere except large tech companies) have had limited ability to carry out tests and experiments.

More efficient models and techniques change the situation. Experimentation and development may now be significantly easier for us.

For consumers, access to AI may also become cheaper. More AI models may be run on users’ own devices, such as laptops or phones, rather than running “in the cloud” for a subscription fee.

For researchers who already have a lot of resources, more efficiency may have less of an effect. It is unclear whether DeepSeek’s approach will help to make models with better performance overall, or simply models that are more efficient.

*Tongliang Liu, Associate Professor of Machine Learning and Director of the Sydney AI Centre, University of Sydney.

This article is republished from The Conversation under a Creative Commons license. Read the original article.

We welcome your comments below. If you are not already registered, please register to comment.

Remember we welcome robust, respectful and insightful debate. We don't welcome abusive or defamatory comments and will de-register those repeatedly making such comments. Our current comment policy is here.

16 Comments

by murray86 | 29th Jan 25, 11:41am 1738104068

There are issues with the responses though apparently. I have heard that if you ask DeepSeek what happened in Tianamen Square it declines to answer. The critics are suggesting that it will only provide answers about China which the CCP will approve of.

This will come down to what is 'truth'?

On the other hand that free MIT Licence provides an opportunity for the US companies to learn and copy, as well as individuals to modify the responses identified above to be more open.

by rastus1 | 29th Jan 25, 12:07pm 1738105624

I have found that Chat GPT has a strong bias when questioning who owns what and why and should etc in context of our little festering issue in NZ.

by nktokyo | 29th Jan 25, 1:03pm 1738109020

Meh, I hadn't asked ChatGPT about Tinamen square either.

I think the US will try to restrict access to data next.

by Juha Saarinen | 29th Jan 25, 6:04pm 1738127090

It does that, as I believe those are Chinese regulations. You can get around it, but it gets a bit silly when Winnie the Pooh mentions come under suspicion. DeepSeek-R1 run locally does it as well.

by kiwimummma | 31st Jan 25, 9:33am 1738269224

Plenty of censorship and propaganda on western AI too

by Thinker | 29th Jan 25, 11:58am 1738105114

Another classic example of why the only way to improve productivity is more competition - take notes NZ government. I saw Open AI's CEO come out and say they're bringing forward product releases as a result of this.

by Baywatch | 29th Jan 25, 12:43pm 1738107822

Another example of China leapfrogging the West in industry after industry....

by Pa1nter | 29th Jan 25, 1:16pm 1738109816

B-b-but, globalism was supposed to mean some minions somewhere else would do all the crappy lame old stuff we don't want to do, while we go on to develop newer, more exciting and better paying jobs.

They can't do both, no fair!

by OldSkoolEconomics | 29th Jan 25, 1:33pm 1738110803

Yip - Auto, manufacturing, tech, services.... the west has little hope of competing now. AI was the last great hope...

Watch some videos showing their modern cities, advanced infrastructure, train speeds, hyperloop development, military might, space program successes and scientific research in most areas. They are smashing it.

We are 100% watching the balance of power shift right now. the DeepSeek news at the weekend was the final nail for the USA. OpenAI and Co can try to jump a release or two, but have no hope of keeping up in the mid term. What did the USA - appointed a Real Estate Populist Right Wing, insular Grandad to save them! lol

by mark_a | 29th Jan 25, 2:06pm 1738112788

We might just be stuck with only individual freedoms and human rights.

Sounds good to me.

by Baywatch | 29th Jan 25, 4:00pm 1738119610

Yup ..just look at MAGA USA for how that's going...

by mark_a | 29th Jan 25, 4:04pm 1738119876

Too busy looking at China today.......

They don't even vote there don't you know!

by OldSkoolEconomics | 29th Jan 25, 5:46pm 1738125993

Yeah. Us FREE westerners get to choose between some great leaders 🤣.

Our democracy is so different to chinas one party.

Tbh the main difference is that every 4 years Our new leaders change tack on all the important stuff.. so we always get short term visions and don't complete anything. Whereas they finish what they start

by mark_a | 29th Jan 25, 7:33pm 1738132419

Two different models I agree.

Still prefer the one with democracy and human rights - noisy tho it is as you note.

No problems if you prefer single party state and a dictator.

Each to their own!

by Averageman | 29th Jan 25, 2:50pm 1738115455

Opensource access to the model, to allow independence from US licensing models, that increase every year for no benefit but "Merican" shareholders greed, can only be a good thing.

by IT GUY | 30th Jan 25, 8:58am 1738180717

IMHO these OS models are a massive jump forward, spent an entertaining evening asking Deep Seek questions last night. Its scary good at a few things like

"What does building management mean? and coding or math questions, was not good at

"How do you make beer?"