New King of Long Text Understanding? Gemini2.5Pro Beats o3 and Leads Fiction.Live Benchmark
In the recent Fiction.Live benchmark test, Gemini2.5Pro performed excellently in understanding and reproducing complex stories and backgrounds, leading ahead of OpenAI's o3 model. This test goes far beyond traditional "needle-in-a-haystack" tasks, focusing on a model's ability to handle deep semantics and context-dependent information within large contexts. According to the test data, when the context window length reaches 192,000 tokens (approximately 144,000 words), the performance of the o3 model drops sharply, while