Information

Latest AI News

Explore AI Frontiers, Master Industry Trends

AI Daily Brief

Your Daily AI Brief - Never Miss What's Next

Information

AI Product Finder

Smart Product Discovery - Comprehensive Market Intelligence

AI Product Rankings

AI Product Power Rankings - Performance, Buzz & Trends

AI Product Submit

Submit Your AI Product - Amplify Reach & Drive Growth

Tools

AI Tools Directory

Discover The Best AI Websites & Tools

Information

AI Models Finder

Comprehensive AI Models Collection for All Your Development & Research Needs

LLM Leaderboard

AI LLM Power Rankings - Performance, Buzz & Trends

Model Providers

Discover Trusted AI Model Partners - Guaranteed Reliable Support

Submit Your Model

Submit Your Model Info & Services - Precision Marketing & User Targeting

Tools

Compare LLMs

Multi-Dimensional Large Model Comparison - Find Your Perfect Match

LLM Cost Calculator

Calculate AI Model Costs Accurately - Optimize Your Budget

LLM Arena

Multi-Model Real-Time Evaluation & Quick Output Comparison

Information

MCP Servers

Discover Popular AI-MCP Services - Find Your Perfect Match Instantly

MCP Client

Easy MCP Client Integration - Access Powerful AI Capabilities

MCP Case Tutorials

Master MCP Usage - From Beginner to Expert

MCP Ranking

Top MCP Service Performance Rankings - Find Your Best Choice

MCP Service Submission

Publish & Promote Your MCP Services

Tools

MCP Playground

Test MCP Services Freely - Quick Online Experience

MCP Inspector

Quick MCP Service Testing - Fast Deployment

GEO Services

Achieve Dominant Visibility in AI Search for Your Business or Brand with GEO Services

AI Search Visibility Checker

Detect brand's visibility on AI platforms

Tools

AI Model Compatibility Checker

Free PC Hardware Test for DeepSeek & Llama

Information

AI Dataset Collection

Large-scale datasets and benchmarks for training, evaluating, and testing models to measure

Tools

Intelligent Document Recognition

Comprehensive Text Extraction and Document Processing Solutions for Users

AI Tutorial

GLM-4.5V, the Vision Reasoning Model from Zhipu, is Now Live and Open-Sourced

AIbase基地

Published inAI News · 5 min read · Aug 12, 2025

Zhipu announced the release and open-sourcing of GLM-4.5V, the best-performing open-source visual reasoning model at the 100B scale globally. This marks another important exploratory achievement for the company on its path toward Artificial General Intelligence (AGI). The model is now open-sourced simultaneously on the ModelScope community and Hugging Face. With a total parameter count of 106B and an activated parameter count of 12B, it represents a new milestone in multimodal reasoning technology.

GLM-4.5V is based on Zhipu's new generation flagship text foundation model, GLM-4.5-Air, and continues the technical approach of GLM-4.1V-Thinking. In 41 public visual multimodal rankings, GLM-4.5V achieved the highest performance (SOTA) among open-source models of the same level, covering common tasks such as image, video, document understanding, and GUI Agent. The model not only performs well on multimodal benchmarks but also emphasizes performance and usability in real-world scenarios.

Through efficient hybrid training, GLM-4.5V is capable of processing various types of visual content, achieving full-scenario visual reasoning, including image reasoning, video understanding, GUI tasks, complex chart and long document parsing, and grounding capabilities. The newly added "thinking mode" switch allows users to choose between fast response or deep reasoning, balancing efficiency and effectiveness.

WeChat screenshot_20250812081729.png

To help developers experience the model capabilities of GLM-4.5V, Zhipu Qingyan has also open-sourced a desktop assistant application. This application can capture screenshots and record screens in real time to obtain screen information and rely on GLM-4.5V to process various visual reasoning tasks, such as code assistance, video content analysis, game solutions, and document interpretation, becoming a companion that can work and entertain with you by watching the screen.

The API of GLM-4.5V is now available on Zhipu's open platform BigModel.cn, offering all new and existing users a free resource package of 20 million Tokens. The model maintains high accuracy while considering inference speed and deployment cost, providing a cost-effective multimodal AI solution for enterprises and developers. The API call price is as low as 2 yuan per M tokens for input and 6 yuan per M tokens for output, with a response speed of 60-80 tokens/s.

In addition, GLM-4.5V demonstrates strong performance in visual localization, front-end replication, image recognition and reasoning, deep interpretation of complex documents, and GUI Agent capabilities. For example, it can accurately identify and locate target objects, replicate web pages, infer background information from subtle clues in images, read and interpret long complex texts of dozens of pages, and perform tasks such as dialogue questions and icon location in GUI environments.

The technical details of GLM-4.5V include a visual encoder, MLP adapter, and language decoder. It supports a 64K multimodal long context, accepts image and video inputs, and improves video processing efficiency through 3D convolution. The model uses a bicubic interpolation mechanism, effectively enhancing the ability to process high-resolution and extreme aspect ratio images. Additionally, it introduces 3D Rotational Positional Encoding (3D-RoPE), significantly strengthening the perception and reasoning ability for 3D spatial relationships in multimodal information.

GitHub: https://github.com/zai-org/GLM-V
Hugging Face: https://huggingface.co/collections/zai-org/glm-45v-68999032ddf8ecf7dcdbc102
ModelScope Community: https://modelscope.cn/collections/GLM-45V-8b471c8f97154e

GLM-4.5V ZhipuAI Open-sourcemodel Multimodalinference

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

Meta Plans to Acquire RISC-V Company Rivos, Focusing on AI Chip Technology Upgrades

Meta plans to acquire RISC-V chip firm Rivos to accelerate scalable computing. VP Song Yijun highlights Rivos' expertise in full-stack AI system design, aiming to advance AI development. Rivos, backed by Intel's CEO, is based in Santa Clara.....

Oct 2, 2025

150

NVIDIA's Market Capitalization Exceeds $4.5 Trillion, with Frequent AI Infrastructure Transactions

NVIDIA's stock price hit a new high on Tuesday, rising nearly 3%, with its market capitalization exceeding $4.5 trillion, and it has gained about 39% year-to-date. The company is accelerating investments in the AI field. OpenAI announced that NVIDIA will invest $100 billion in equity and plans to build an AI data center worth billions of dollars equipped with its GPUs, drawing market attention.

Oct 1, 2025

160

GLM 4.6 by Zhipu is Here: Domestic Chips Join Forces to Drive AI Advancement

Zhipu launches the GLM-4.6 model, using Cambrian Neural's domestic chips, achieving for the first time FP8+Int4 hybrid quantization deployment. This technological breakthrough significantly reduces inference costs while maintaining model accuracy, opening a new path for domestic chips supporting large model local execution.

Sep 30, 2025

300

Zhipu Releases Open-Source Large Model GLM-4.6: Programming Capabilities Aligned with Claude Sonnet4

Zhipu AI released GLM-4.6, surpassing DeepSeek-V3.2-Exp in coding and matching Claude Sonnet4. It's China's top code model now, with successful deployment on Cambricon chips.....

Sep 30, 2025

330

AI Daily: DeepSeek Releases V3.2-exp Model; Claude Sonnet 4.5 Released; ChatGPT Launches Instant Checkout Feature

DeepSeek releases V3.2-exp, featuring sparse attention to cut long-context costs by 50% and reduce API expenses for developers.....

Sep 30, 2025

200

Ant Group Opensources the World's First Trillion-Parameter Large Model Ring-1T-preview with Code Generation Capabilities Exceeding GPT-5

Ant Group opensources the trillion-parameter inference large model Ring-1T-preview, the world's first open-source trillion-parameter inference model. The preview version shows outstanding performance in natural language reasoning, achieving a score of 92.6 on AIME25, surpassing all known open-source models such as Gemini 2.5 Pro, and approaching GPT-5's score of 94.6; it also performed well on CodeForces tests.

Sep 30, 2025

350

Anthropic Launches Claude Sonnet 4.5: Coding Capabilities Top the Charts Experimental Feature Imagine Demonstrates the AI-Native Interface Era

Claude Sonnet 4.5 launched with major upgrades, including real-time UI generation and record 77.2% coding score. Excels in logic/math tasks.....

Sep 30, 2025

140

Cambrian announces full compatibility of the DeepSeek-V3.2-Exp model, the inference engine is open-sourced!

Cambricon successfully adapted the DeepSeek-V3.2-Exp model and open-sourced the vLLM-MLU inference engine, advancing AI tech. This innovation enhances efficiency and marks progress in its software ecosystem, offering developers new tools.....

Sep 30, 2025

160

DeepSeek releases V3.2-exp model, pioneering sparse attention mechanism significantly reduces AI inference costs

DeepSeek releases the experimental model V3.2-exp, which adopts an innovative 'sparse attention' mechanism to significantly reduce the cost of long context inference. The model is now available on Hugging Face and GitHub. The core is the 'lightning indexer' and optimized attention mechanisms to improve processing efficiency. This breakthrough technology is expected to promote the development of AI in the field of long text processing.

Sep 30, 2025

130

Anthropic Unveils a Major Update! Claude Sonnet 4.5 Outperforms GPT-5, the New King of Coding

Anthropic released the Claude Sonnet 4.5 model, hailed as the best coding model in the world. The model leads in the SWE-bench coding benchmark test, supports web, mobile applications, and API interfaces, and has demonstrated a sustained operation of 30 hours, achieving significant breakthroughs in handling complex tasks and autonomous agent capabilities.

Sep 30, 2025

470

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Services​

AI Search Visibility Checker

AI Model Compatibility Checker

AI Dataset Collection

Intelligent Document Recognition

GLM-4.5V, the Vision Reasoning Model from Zhipu, is Now Live and Open-Sourced

AIbase基地

This article is from AIbase Daily

AI News Recommendations

Meta Plans to Acquire RISC-V Company Rivos, Focusing on AI Chip Technology Upgrades

NVIDIA's Market Capitalization Exceeds $4.5 Trillion, with Frequent AI Infrastructure Transactions

GLM 4.6 by Zhipu is Here: Domestic Chips Join Forces to Drive AI Advancement

Zhipu Releases Open-Source Large Model GLM-4.6: Programming Capabilities Aligned with Claude Sonnet4

AI Daily: DeepSeek Releases V3.2-exp Model; Claude Sonnet 4.5 Released; ChatGPT Launches Instant Checkout Feature

Ant Group Opensources the World's First Trillion-Parameter Large Model Ring-1T-preview with Code Generation Capabilities Exceeding GPT-5

Anthropic Launches Claude Sonnet 4.5: Coding Capabilities Top the Charts Experimental Feature Imagine Demonstrates the AI-Native Interface Era

Cambrian announces full compatibility of the DeepSeek-V3.2-Exp model, the inference engine is open-sourced!

DeepSeek releases V3.2-exp model, pioneering sparse attention mechanism significantly reduces AI inference costs

Anthropic Unveils a Major Update! Claude Sonnet 4.5 Outperforms GPT-5, the New King of Coding

GEO Services