OpenAI has officially launched two new small AI models - GPT-5.4mini and GPT-5.4nano. These models are specifically designed for high-frequency, low-latency scenarios, marking a qualitative leap in performance while maintaining a lightweight design.

Key Highlights: The Perfect Balance of Low Latency and High Efficiency

Officially emphasized, in scenarios that require extremely fast response speeds such as code assistance, system screenshot analysis, and real-time image reasoning, the GPT-5.4 series small models perform excellently:

image.png

  • GPT-5.4mini: It far exceeds its predecessors in code writing, logical reasoning, and multimodal understanding, with a running speed that is more than twice as fast. Its performance in multiple benchmark tests is already approaching the much larger full-featured version of GPT-5.4, enabling efficient processing of complex database navigation and front-end code generation tasks.

  • GPT-5.4nano: As the smallest and most cost-effective version currently available, it is designed for text classification, data extraction, and simple assistance tasks, aiming to provide developers with the ultimate cost-effectiveness.

Specifications and Pricing

In terms of technical specifications and usage costs, both models demonstrate strong competitiveness:

image.png

  • GPT-5.4mini: Supports an ultra-long context window of 400k. Its API price is $0.75 per million tokens input and $4.50 per million tokens output. It is now fully integrated with API, Codex, and ChatGPT.

  • GPT-5.4nano: Currently available only through API, its pricing is highly disruptive, with only $0.20 per million tokens input and $1.25 per million tokens output.

The release of these two models indicates that AI applications will shift from "pursuing scale" to "pursuing practical efficiency," providing a more reliable underlying support for real-time AI interaction and the decomposition of complex task flows through ultra-fast response capabilities.