On August 11, Zhipu Technology officially launched its latest visual understanding model — GLM-4.5V. This model is trained based on its next-generation text model GLM-4.5-Air, following the technical approach of the previous visual reasoning model GLM-4.1V-Thinking. It has an impressive 106 billion parameters and 12 billion activated parameters. Notably, GLM-4.5V introduces a "thinking mode" switch function, allowing users to choose whether to enable this mode, thereby offering more flexibility when handling tasks.

This model's visual capabilities are remarkable, as it can easily distinguish between McDonald's and KFC fried chicken wings, conducting in-depth analysis from multiple perspectives such as appearance, color, and texture. In addition, GLM-4.5V can participate in image-based location guessing challenges, even achieving excellent results in competitions, surpassing 99% of human participants and ranking 66th. Zhipu also demonstrated the model's outstanding performance on 42 benchmark tests, scoring higher than other models of similar scale in most tests.

Currently, GLM-4.5V is available on open-source platforms such as Hugging Face, ModelScope, and GitHub, where users can download and use it for free. A FP8 quantized version is also provided. To better experience this model, Zhipu launched a desktop assistant application that supports real-time screenshot and screen recording, helping users complete various visual reasoning tasks, including code assistance and document interpretation.

In practical testing, GLM-4.5V demonstrated excellent capabilities, able to infer locations based on uploaded images, although occasionally making small errors, its reasoning process remains very rich. When processing web content, it can generate pages with high similarity through screenshots, showcasing strong replication capabilities.

GLM-4.5V not only excels in the field of visual understanding but also shows great potential in Agent application scenarios. As this technology continues to develop, we have every reason to expect that it will bring more convenience to people's lives in the future.