SaMedia—Chinese AI firm DeepSeek has unveiled what it claims to be one of the most powerful open AI models to date. The model, DeepSeek V3, was released on Wednesday under a permissive license, allowing developers to download and modify it for most applications, including commercial ones, TechCrunch reported.
DeepSeek V3 boasts impressive specifications: it processes 60 tokens per second, three times faster than its predecessor V2, while maintaining API compatibility. It features fully open-source models and papers, 671 billion mixture-of-experts (MoE) parameters, 37 billion activated parameters, and was trained on a dataset of 14.8 trillion tokens.
The model significantly outperforms Meta’s Llama 3.1-405B on nearly every benchmark, according to DeepSeek. Andrej Karpathy, a prominent AI researcher and former Director of AI at Tesla, praised the model on X, noting its remarkable performance given the relatively low computational budget of 2,048 GPUs for two months, costing only $6 million. He contrasted this with other models requiring clusters of up to 100,000 GPUs.
DeepSeek V3 is available via Hugging Face under the company’s license agreement. The model uses a mixture-of-experts architecture, activating only select parameters to handle tasks efficiently and accurately. Benchmark results indicate it outperforms leading open-source models and closely matches the performance of proprietary models from Anthropic and OpenAI.
This release marks a significant development in closing the gap between closed and open-source AI. DeepSeek, originally an offshoot of Chinese quantitative hedge fund High-Flyer Capital Management, hopes these advancements will pave the way for artificial general intelligence (AGI), enabling models to understand or learn any intellectual task that humans can.
The AI industry is watching closely as DeepSeek V3 sets new standards and demonstrates the potential of open-source AI models.