The big model race has stirred up new waves. AI startup MiniMax recently officially launched its new flagship large model, M3. According to the technical report's various benchmark tests, the model's performance is astonishing: in tests that are considered close to real software engineering scenarios, M3 achieved an excellent score of 59%, not only surpassing GPT-5.5 but also approaching Opus4.7. In addition, it features a million-level context processing capability and native multi-modal characteristics. However, in stark contrast to its strong technical indicators, the model's release triggered intense backlash from the developer community, with Chinese communities especially filled with criticism.

The first core focus of industry skepticism lies in the "hidden agenda" behind the evaluation data. Technical details show that M3 used Claude Code, a competitor's model, as the evaluation framework in tests related to coding (code) capabilities. Although using existing toolchains to run agent (Agent) evaluations is a common practice in the current industry, MiniMax used someone else's framework to test its own capabilities and directly scored high, comparing itself with the competitor for public promotion. This approach has been criticized by many programmers as "not straightforward." Users find it difficult to distinguish how much of the impressive results are the model's native abilities and how much is due to the framework's enhancement.

image.png

Secondly, the issue of "open source" sincerity has left the open-source community confused. Unlike other vendors releasing open-source models, MiniMax did not disclose the size of M3 or release the model's "weights" this time. It only stated that it would open-source the model within 10 days after the release, and currently users can only access it through an API. Since the core value of the open-source community is "reproducibility and verifiability," this approach of promoting open-source without providing weights, leaving everyone unable to independently understand the model's details in their local environment, may be understandable from a commercial logic perspective, but seriously harmed the developer community that values practicality and honesty.

image.png

The most frustrating thing for heavy users is the sudden adjustment of the billing rules (Coding Plan). Previously, MiniMax was known as "generous with volume" because it limited rate by request count and did not set a monthly Token total limit. However, with the release of M3, the official also introduced a new Token Plan, changing the rules to total volume billing. Although the official claims that the Plus plan's Token usage is highly cost-effective, in heavy usage scenarios with a million context, each call often consumes a lot, and the new rules cause the package quota to deplete quickly, leading to collective complaints from old users.

Setting aside these operational controversies, M3's innovations in the underlying architecture still have notable highlights. It independently developed a sparse attention mechanism called MSA (MiniMax Sparse Attention), which breaks the traditional Transformer's computational explosion in long-context calculations by performing high-precision block division and sparsification on KV (Key-Value). At the level of low-level operators, the model pioneered a new way of computing aggregation, making memory access more continuous and achieving a speed four times faster than the open-source Flash-Sparse-Attention. This allows M3 to improve forward propagation and decoding speeds by 9 times and 15 times respectively under a million context, with single-Token computation reduced to half of the previous generation.