Significant milestones have been achieved in the collaborative optimization of domestic AI chips and large models. Recently, **Moore Threads and Silicon Base Flow jointly announced that they have successfully completed deep adaptation and performance verification of the trillion-parameter large model DeepSeek V3 671B "full-blooded version" based on the domestic GPU MTT S5000**. By innovatively applying FP8 (8-bit floating point) low-precision inference technology, the actual performance is impressive: **the single-card Prefill (pre-filling) throughput exceeds 4000 tokens/second, and the Decode (decoding) throughput surpasses 1000 tokens/second**, with the overall inference speed approaching that of international mainstream high-end AI acceleration cards.
This achievement holds great significance. As a leading open-source large model in China, DeepSeek V3 671B has a massive parameter scale and high inference load, and it has previously relied heavily on high-end GPUs such as NVIDIA A100/H100 for deployment. The realization of efficient operation on a **completely domestically developed hardware platform** not only verifies the real capabilities of Moore Threads' MTT S5000 in large model inference scenarios, but also marks a new stage in the development of the domestic AI computing ecosystem, from "being able to run" to "running efficiently."
The key technological breakthrough lies in the deep optimization of FP8 low-precision inference. FP8 can significantly improve computing throughput, reduce memory usage and power consumption while maintaining minimal loss of model accuracy. Moore Threads and Silicon Base Flow jointly completed full-stack optimization from the underlying drivers, operator libraries to the inference engine, enabling the MTT S5000 to fully leverage the potential of FP8 hardware acceleration, effectively supporting the high-concurrency, low-latency inference requirements of large models.
For the industry, this means that the path of domestic substitution has become clearer: **high-performance large models no longer need to be tied to overseas high-end chips**. In the context of global uncertainty in the computing power supply chain, the combination of MTT S5000 and DeepSeek V3 provides a cost-effective and secure local AI deployment option for key sectors such as finance, government, and energy.
Although there are still gaps between domestic GPUs and international top products in absolute peak performance or software ecosystem maturity, the test data shows that **in specific high-value scenarios, domestic solutions already have practical competitiveness**. With continuous deepening of software and hardware collaboration, China's autonomous and controllable capabilities in AI infrastructure are accelerating—moving from "being usable" to "being convenient," all it takes is one breakthrough after another through real-world testing.