In the recent Fiction.Live benchmark test, Gemini2.5Pro performed excellently in understanding and reproducing complex stories and backgrounds, leading ahead of OpenAI's o3 model. This test goes far beyond traditional "needle-in-a-haystack" tasks, focusing on a model's ability to handle deep semantics and context-dependent information within large contexts. According to the test data, when the context window length reaches 192,000 tokens (approximately 144,000 words), the performance of the o3 model drops sharply, while