Amid the intensifying global AI chip competition, the startup chip company Positron has officially unveiled its new AI inference chip, Asimov. The company claims that this chip, which is deeply optimized for large model (LLM) inference, is expected to achieve five times the energy efficiency (tokens per watt) and cost-effectiveness (tokens per dollar) of NVIDIA's next-generation Rubin architecture. This bold data has immediately attracted widespread attention in the industry.
Positron's core logic lies in a "subtraction" approach to redefining traditional GPU architectures. The Asimov chip discards the complex control circuits found in traditional computing cards and instead adopts a more pure tensor processing architecture, aiming to minimize energy loss in non-computational stages. This design not only allows Asimov to consume less power when running models of the same scale but also significantly reduces the manufacturing and packaging costs of the chip. The Positron team emphasized that, given the strict power limitations in current data centers, this extreme energy efficiency will become a key factor for enterprises deploying AI services.
Although Asimov demonstrates impressive theoretical figures, challenging NVIDIA's market position is no easy task. Currently, Positron is working on building a supporting compiler and development ecosystem to ensure developers can seamlessly migrate existing PyTorch or TensorFlow models. The Asimov chip is planned to use advanced process technology and has been hardware-optimized for the current mainstream Transformer architecture, ensuring high throughput and low latency when processing trillion-parameter models.
AIbase believes that Positron's entry represents the trend of transitioning from "general computing power" to "specialized inference" in the AI chip field. If Asimov can fulfill its promise of five times the performance, it could completely reshape the cost landscape of the large model inference market.
Key Points:
🚀 Energy Efficiency Challenge: Asimov chip claims to have five times the token efficiency per watt and per dollar compared to NVIDIA's future Rubin architecture, focusing on extreme cost-effectiveness.
🏗️ Architectural Simplification Innovation: Discarding redundant designs of general computing, the chip uses a specialized architecture focused on tensor computation, significantly reducing energy loss and hardware costs during inference.
🌐 Targeting Large-Scale Inference: Hardware design is deeply optimized for the Transformer architecture, aiming to address power bottlenecks and high operational costs when deploying trillion-parameter models.


