The Thinking Machines Lab, founded by former OpenAI Chief Technology Officer Mira Murati, has recently announced a significant technological breakthrough, successfully solving the long-standing issue of model output uncertainty in the AI industry. In their latest research report, the lab stated that they have achieved fully deterministic output in the reasoning process of large language models.

The research report titled "Defeating Nondeterminism in LLM Inference" points out that even with the temperature parameter set to 0, traditional large language models can still produce different outputs for the same input. The research team found the root cause of this phenomenon through in-depth analysis and proposed an effective solution.

The research team identified two main technical reasons. First, the non-associativity of floating-point addition. In GPU parallel computing environments, the results of (a + b) + c and a + (b + c) may have slight differences, which are magnified layer by layer in complex neural networks.

image.png

A more critical finding is that changes in parallel computing strategies are the fundamental cause of output uncertainty. Different batch sizes, sequence lengths, and KV cache states can affect the selection strategy of GPU kernels, changing the execution order of calculations and ultimately leading to differences in output results.

To address this technical challenge, Thinking Machines Lab proposed a batch-invariant solution. This solution requires all key computation kernels to maintain the same calculation order and result when processing different batch sizes or sequence splits. The research team also provided detailed optimization methods for specific computational modules such as RMSNorm, matrix multiplication, and attention mechanisms.

To verify the effectiveness of the technical solution, the research team conducted experiments using the Qwen3-235B-A22B-Instruct-2507 model, which has 235 billion parameters. After 1,000 repeated tests, the model achieved 100% output consistency under the same input conditions, which is a first in the history of large language model development.

image.png

Industry experts believe that this technological breakthrough holds significant importance for enterprise-level AI applications. Scenarios such as financial risk control, medical diagnosis, and legal document review, which require high accuracy and consistency, will directly benefit from this technological advancement.

Thinking Machines Lab chose to release the findings in an open research format, providing new technical reference directions for global AI developers. This research not only solves the issue of model output predictability but also lays the technical foundation for the transformation of AI systems from experimental tools to production tools.

It is understood that Thinking Machines Lab was established in 2023 and focuses on AI foundational technology research. The laboratory has previously received a $2 billion seed investment and plans to launch its first product in the coming months.

This technological breakthrough marks that the AI industry is shifting from pursuing model scale to focusing on application quality. With the widespread adoption of deterministic output technology, the reliability and practicality of AI systems are expected to see significant improvements.

Official Research Report: https://thinkingmachines.ai/blog/defeating-nondeterminism-in-llm-inference/