AMD Instinct GPUs Power DeepSeek-V3
The DeepSeek-V3 model is a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were a part of its predecessor, DeepSeek-V2. Furthermore, DeepSeek-V3 pioneers an auxiliary-loss-free strategy for load balancing and sets a multi-token prediction training objective for stronger performance. DeepSeek-V3 allows developers to work with advanced models, leveraging memory capabilities to enable processing text and visual data at once, enabling broad access to the latest advancements, and giving developers more features. DeepSeek-V3 achieves the best performance on most benchmarks, especially on math and code tasks.
AMD Instinct GPU Accelerators and DeepSeek-V3
AMD Instinct GPUs accelerators are transforming the landscape of multimodal AI models, such as DeepSeek-V3, which require immense computational resources and memory bandwidth to process text and visual data. AMD Instinct accelerators deliver outstanding performance in these areas.
Leveraging AMD ROCm software and AMD Instinct GPU accelerators across key stages of DeepSeek-V3 development further strengthens a long-standing collaboration with AMD and commitment to an open software approach for AI. Scalable infrastructure from AMD enables developers to build powerful visual reasoning and understanding applications.
Extensive FP8 support in ROCm can significantly improve the process of running AI models, especially on the inference side. It helps solve key issues such as memory bottlenecks and high latency issues related to more read-write formats, enabling larger models or batches to be processed within the same hardware constraints, resulting in a more efficient training and inference process. In addition, FP8 reduced precision calculations can reduce delays in data transmission and calculations. AMD ROCm extends support for FP8 in its ecosystem, enabling performance and efficiency improvements in everything from frameworks to libraries.
AMD and DeepSeek Collaboration: Day 0 Support Readiness:
With the release of DeepSeek-V3, AMD continues its tradition of fostering innovation through close collaboration with the DeepSeek team. This partnership ensures that developers are fully equipped to leverage the DeepSeek-V3 model on AMD Instinct GPUs right from Day-0 providing a broader choice of GPUs hardware and an open software stack ROCm for optimized performance and scalability. AMD will continue optimizing DeepSeek-v3 performance with CK-tile based kernels on AMD Instinct GPUs. AMD is committed to collaborate with open-source model providers to accelerate AI innovation and empower developers to create the next generation of AI experiences.
Acknowledgement:
We sincerely appreciate the exceptional support and close collaboration with the DeepSeek and SGLang teams. A special thanks to AMD team members Peng Sun, Bruce Xue, Hai Xiao, David Li, Carlus Huang, Mingtao Gu, Vamsi Alla, Jason F., Vinayak Gok, Wun-guo Huang, Caroline Kang, Gilbert Lei, Soga Lin, Jingning Tang, Fan Wu, George Wang, Anshul Gupta, Shucai Xiao, Lixun Zhang, and everyone else who contributed to this effort.