Font size:
Print
DeepSeek’s AI Breakthrough
Context:
The Chinese startup DeepSeek has created a stir in the global AI industry with its models, particularly the DeepSeek-R1, which are claimed to nearly match the capabilities of top U.S. AI companies like OpenAI, but at a significantly lower cost.
More on News
- DeepSeek’s AI Assistant, powered by DeepSeek-V3, has overtaken OpenAI’s ChatGPT to become the top-rated free app on Apple’s U.S. App Store.
- This success has led to questions about the billions being spent by U.S. AI companies and has caused tech stocks, including Nvidia, to take a hit.
DeepSeek-R1: The “Thinking” Model That Changes the Game
- Test-Time Compute (TTC): DeepSeek-R1 can actively “think” while generating responses, breaking down problems step-by-step instead of providing pre-trained answers.
- Surpasses OpenAI o1: In tasks like math, coding, and general knowledge, R1 has matched or exceeded the performance of OpenAI’s frontier models.
- 90-95% Cheaper Than OpenAI o1: Unlike closed and expensive models, R1 is powerful, free, and open-source, raising questions about the necessity of massive AI investments.
DeepSeek’s Origins
- DeepSeek is headquartered in Hangzhou and controlled by Liang Wenfeng, co-founder of the quantitative hedge fund High-Flyer.
- In March 2023, High-Flyer announced a pivot from trading to AI research, leading to DeepSeek’s founding later that year.
- While High-Flyer’s total investment in DeepSeek remains unclear, records show the fund owns AI training-related patents and operates a cluster of 10,000 A100 chips.
Cost Efficiency
- DeepSeek revealed that its DeepSeek-V3 model was trained for under $6 million, using Nvidia H800 chips, which is a fraction of the cost compared to U.S. companies that spend billions.
- The DeepSeek-R1 model is claimed to be up to 50 times cheaper to operate than OpenAI’s GPT-4, depending on the task.
Why is DeepSeek-V3 So Disruptive?
- Mixture-of-Experts (MoE) Architecture: Instead of a single monolithic model, DeepSeek-V3 uses a team of specialised models that collaborate for each task.
- 14.8 Trillion Tokens: The model has been trained on an unprecedented dataset, improving its language comprehension and reasoning abilities.
- Multi-Head Latent Attention (MLA): A new efficiency technique that reduces computation costs while enhancing accuracy.
- Open Source Approach: Unlike closed-source models from OpenAI and Google, DeepSeek-V3 has open weights, allowing anyone to build on and improve it.
Global Impact
- Tech Market Disruption: Nasdaq’s 3% drop signals how DeepSeek’s efficiency has unsettled investors, questioning the massive AI investments made by US tech giants.
- US-China AI Rivalry Intensifies: Much like the 1957 Sputnik moment, DeepSeek’s breakthrough could escalate AI competition between Washington and Beijing. US policymakers may tighten semiconductor restrictions to curb China’s AI rise.
- Opportunities for Middle Powers Like India & Europe: India and the EU have been pushing for “Sovereign AI”—DeepSeek’s open-source approach could be a model for nations seeking AI independence.
- DeepSeek’s efficiency proves that smart innovation can reduce reliance on US or Chinese tech giants.
Lessons for India and Other Emerging Markets
- DeepSeek’s achievement highlights that AI progress is no longer about brute force but smart innovation.
- India, with its strong software talent, frugal engineering mindset, and entrepreneurial ecosystem, can capitalise on this shift.
- While India cannot match the US and China in scale, it can:
- Leverage its strong software talent and AI research ecosystem.
- Develop AI applications tailored to Indian needs, such as healthcare and agriculture.
- Collaborate strategically with both the US and EU while maintaining independence.
Controversies and Concerns
- Scepticism over cost claims: Some analysts doubt the $5.58 million figure for training DeepSeek-v3.
- Access to Nvidia chips: Reports suggest DeepSeek may have 50,000 Nvidia H100 chips, despite U.S. export restrictions.
- Ethical Concerns: Making such a powerful AI model freely available raises risks of misuse by rogue states, cyber criminals, and bad actors.
- Cybersecurity concerns: DeepSeek operates under strict Chinese regulations, raising questions about data privacy and government oversight.
- Governments must balance innovation with security, ensuring responsible AI use through regulatory frameworks.
The Future of AI and Geopolitical Implications
- DeepSeek’s success challenges the belief that AI requires massive resources and could change investment priorities in the AI sector.
- If China can bypass Western chip sanctions and still produce leading AI models, it could redefine global AI leadership.
- For India and other emerging economies, this is a call to action—embracing efficiency-driven AI innovation can unlock new opportunities and reshape global competition.