In the recently released Chinese Precise Instruction Following Evaluation Benchmark (SuperCLUE-CPIF), Baidu's ERNIE X1.1 achieved an outstanding score of 75.51, becoming a top domestic large model and securing the first position in China. This evaluation includes as many as 10 well-known models from both inside and outside China, such as GPT-5(high), DeepSeek-V3.2-Exp-Thinking, Claude-Sonnet-4.5-Reasoning, Gemini-2.5-Pro, etc., focusing on assessing the ability of large language models (LLMs) to execute complex instructions in the Chinese environment.
The SuperCLUE-CPIF evaluation not only focuses on the types of tasks and the number of instructions for the model, but also particularly emphasizes the model's ability to convert natural language instructions into specific outputs that meet requirements. In this evaluation, ERNIE X1.1 performed exceptionally well in real production environments, demonstrating its strong advantages in complex writing tasks and diverse scenarios.
ERNIE X1.1 is a deep thinking model trained based on ERNIE Large Model 4.5. During the upgrade process, it adopted an iterative hybrid reinforcement learning training framework. This means that it not only improves the performance of general tasks and agent tasks, but also continuously enhances overall performance through iterative training with self-distilled data.
In practical applications, ERNIE X1.1 can flexibly use built-in knowledge and online search tools to accurately capture the information users need, while deeply understanding users' creative writing needs, and finally outputting content that is well-structured, logically clear, and elegantly written. For example, when handling customer service on a shared bike platform, ERNIE X1.1 can comprehensively consider the user's emotional state and the type of problem, thereby efficiently solving the problem and demonstrating a complete and proactive service process.
As one of the earliest companies to invest in large model research and development in China, Baidu continues to drive the evolution of the ERNIE large model through its full-stack self-developed system of "chip - framework - model - application." Data shows that ERNIE X1.1 has improved by 34.8% and 12.5% in factual accuracy and instruction following capabilities compared to its predecessor ERNIE X1, and the performance of agents has improved by 9.6%. This achievement undoubtedly sets a new benchmark for the development of domestic large models.