DeepSeek Launches Visual Recognition Mode for Gradual Testing, Multimodal Visual Understanding Capabilities Officially Implemented

AIbase基地

Published inAI News · 4 min read · Apr 30, 2026

Five days after the release of DeepSeek-V4 and its significant impact on the industry, DeepSeek officially launched a grayscale test for its multimodal image recognition feature, marking the entry of its multimodal capabilities into a practical implementation phase. This update added an "Image Recognition Mode" entry in the input bars of mobile and web versions, and prominently labeled "Image Understanding Function in Internal Testing," completing a crucial shift from pure text/code to visual interaction.

Test data shows that DeepSeek performs exceptionally well in basic visual understanding and scene description. It can generate highly accurate descriptive texts when identifying complex figures, environmental composition, and photographic details. When the "Thinking Mode" is enabled, the model demonstrates deep logical reasoning capabilities, accurately deducing the artistic style and historical background based on visual characteristics of cultural relics. Additionally, its ability to extract textual information and judge scenes in images has reached industry mainstream standards.

However, there is still room for improvement when facing extreme visual challenges. Tests show that the module's recognition rate is limited when processing interference images such as fragmented or inverted color images. In tasks involving element counting and complex graphical logic reasoning, although the model shows self-博弈-style reasoning attempts, there is still room for improvement in accuracy and response efficiency. Moreover, its coverage of extremely new product information is still constrained by the update cycle of the existing knowledge base.

Industry analysis indicates that this feature is currently more like a visual understanding module attached to the main model, aimed at verifying the multimodal pipeline through grayscale testing. With the rapid iteration of DeepSeek's visual patches, the competitive focus of domestic large models in the native multimodal field is shifting from "parameter scale" to "omni-scenario perception." This internal test not only fills the core functional gap of DeepSeek but also indicates that its native multimodal major features are entering the final preparation stage.

DeepSeek-V4 MultimodalImageRecognition AIBuzzwords VisualInteraction

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

AI Daily: DeepSeek Image Recognition Mode Beta Test; Xiaohongshu Establishes AI Primary Department; Alibaba Launches Programmer Digital Avatar QoderWake

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present the latest content in the AI field for you, focusing on developers to help you understand technical trends and innovative AI product applications. Click to learn more about new AI products: https://app.aibase.com/zh1. DeepSeek has launched a beta test for its image recognition mode, officially implementing multimodal visual understanding capabilities. After the release of DeepSeek-V4, DeepSeek quickly launched the multimodal image recognition function.

Apr 30, 2026

150

Valuation Surges to $90 Billion, Anthropic May Start Its Final Major Funding Round Before IPO

Anthropic is attracting capital interest, with institutions planning a $50 billion investment at a $900 billion valuation. Its annual recurring revenue has surged to over $30 billion from $9 billion at end-2025, driving investor frenzy.....

Apr 30, 2026

200

Google Gemini Enters the Automotive System! General Motors 4 Million Owners Get an AI Brain

General Motors has partnered with Google to deploy the Gemini AI assistant in approximately 4 million 2022 model year and newer Cadillac, Chevrolet, Buick, and GMC vehicles across the United States. This AI is directly integrated into the native infotainment system and deeply combined with OnStar, becoming one of the largest generative AI deployments in the automotive industry.

Apr 29, 2026

280

DeepSeek Gray Test Image Recognition Mode Achieves Multimodal Image Understanding

DeepSeek is currently conducting a gray-scale test of the image recognition mode, which has multimodal recognition capabilities, enabling deep image analysis and description, not just OCR text recognition. After users upload images, they can get a quick response. Some netizens have described the speed as lightning fast.

Apr 29, 2026

270

Revenue Shortfall Cannot Hide the Investment Enthusiasm: OpenAI and Anthropic Engage in a Struggle for Existing Markets

Despite OpenAI's revenue missing expectations, tech stocks faced pressure, but investors remained resilient, undeterred by negative reports. Analysts note the AI race is still early and not a 'winner-takes-all' game; while computing costs are high, pricing strategies can ease revenue pressures.....

Apr 29, 2026

190

AI Startup Founded by Former Twitter CEO Raises Funding, Valued at 2 Billion Dollars

Former Twitter CEO Parag Agrawal's Parallel Web Systems raised $100M in Series B funding, reaching a $2B valuation. Led by Sequoia Capital with existing investors, the 50-employee firm previously secured $100M in Series A at a $700M valuation.....

Apr 29, 2026

170

Trillion-Level Computing Race Intensifies: Kimi K3 to be Unveiled in Third Quarter, Parameter Scale Targeting 2.5 Trillion

Competition among domestic AI large models is intensifying. After the DeepSeek V4 sparked widespread discussion, Kimi K3 under Moonshot is expected to make its debut in the third quarter of this year, with a parameter scale potentially reaching 2.5 trillion, far exceeding the 1.6 trillion of DeepSeek V4 Pro and about 1 trillion of Baidu Wenxin 5.0. The parameter scale has become a key indicator for measuring model capabilities.

Apr 28, 2026

300

Cybersecurity authorities have lawfully investigated platforms such as Xianying and Jiemeng AI for their non-compliant AI content labeling practices

On April 28, the Cyberspace Administration of China lawfully investigated the apps "Xianying" and "Maoxiang" as well as the website "Jiemeng AI", as these platforms failed to clearly label AI-generated content in accordance with regulations, violating the Cybersecurity Law and the Interim Measures for the Management of Generative AI Services, due to inadequate implementation of the labeling requirements.

Apr 28, 2026

210

Florida Prosecutors Expand Investigation into OpenAI, Focusing on Murder Case at University of South Florida

Florida Attorney General James Uthmeier expands criminal investigation into OpenAI, adding a murder case at the University of South Florida. Suspect Hashim Abujarbiyeh used ChatGPT before the crime. The investigation, initiated on April 21, previously focused on chat records related to a shooting at Florida State University.....

Apr 28, 2026

140

Behind the Hype of DeepSeek-V4: How Does the Open-Source Framework One-Eval End the AI Evaluation Nightmare?

Ten hours after the release of DeepSeek-V4, the DCAI team from Peking University quickly generated a comprehensive automated evaluation report using the newly released open-source One-Eval evaluation framework. Traditional large model evaluation processes are cumbersome, requiring significant effort in setting up testing pipelines. One-Eval significantly improves efficiency, marking a new stage in the industry.

Apr 28, 2026

130

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

GEO Brand Visibility

AI Visibility Audit

AI Search Visibility Checker

GEO Promotion Link Detection

GEO Ranking Optimization System

GEO Ranking Optimization

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

LLM API Hub

AI Models Finder

Model Providers

LLM Leaderboard

Compare LLMs

LLM Cost Calculator

LLM Arena

AI Model Compatibility Checker

AI Deployment Calculator

DeepSeek Launches Visual Recognition Mode for Gradual Testing, Multimodal Visual Understanding Capabilities Officially Implemented

AIbase基地

This article is from AIbase Daily

AI News Recommendations

AI Daily: DeepSeek Image Recognition Mode Beta Test; Xiaohongshu Establishes AI Primary Department; Alibaba Launches Programmer Digital Avatar QoderWake

Valuation Surges to $90 Billion, Anthropic May Start Its Final Major Funding Round Before IPO

Google Gemini Enters the Automotive System! General Motors 4 Million Owners Get an AI Brain

DeepSeek Gray Test Image Recognition Mode Achieves Multimodal Image Understanding

Revenue Shortfall Cannot Hide the Investment Enthusiasm: OpenAI and Anthropic Engage in a Struggle for Existing Markets

AI Startup Founded by Former Twitter CEO Raises Funding, Valued at 2 Billion Dollars

Trillion-Level Computing Race Intensifies: Kimi K3 to be Unveiled in Third Quarter, Parameter Scale Targeting 2.5 Trillion

Cybersecurity authorities have lawfully investigated platforms such as Xianying and Jiemeng AI for their non-compliant AI content labeling practices

Florida Prosecutors Expand Investigation into OpenAI, Focusing on Murder Case at University of South Florida

Behind the Hype of DeepSeek-V4: How Does the Open-Source Framework One-Eval End the AI Evaluation Nightmare?

AI News Recommendations

AI Daily: DeepSeek Image Recognition Mode Beta Test; Xiaohongshu Establishes AI Primary Department; Alibaba Launches Programmer Digital Avatar QoderWake

Valuation Surges to $90 Billion, Anthropic May Start Its Final Major Funding Round Before IPO

Google Gemini Enters the Automotive System! General Motors 4 Million Owners Get an AI Brain

DeepSeek Gray Test Image Recognition Mode Achieves Multimodal Image Understanding

Revenue Shortfall Cannot Hide the Investment Enthusiasm: OpenAI and Anthropic Engage in a Struggle for Existing Markets

AI Startup Founded by Former Twitter CEO Raises Funding, Valued at 2 Billion Dollars

Trillion-Level Computing Race Intensifies: Kimi K3 to be Unveiled in Third Quarter, Parameter Scale Targeting 2.5 Trillion

Cybersecurity authorities have lawfully investigated platforms such as Xianying and Jiemeng AI for their non-compliant AI content labeling practices

Florida Prosecutors Expand Investigation into OpenAI, Focusing on Murder Case at University of South Florida

Behind the Hype of DeepSeek-V4: How Does the Open-Source Framework One-Eval End the AI Evaluation Nightmare?