AI Daily: Alibaba Launches Multimodal Model Qwen3-Omni; Google Unveils AP2 Protocol; Baidu Launches Qianfan-VL Model

Welcome to the "AI Daily" section! This is your guide to exploring the world of artificial intelligence every day. Every day, we present you with the latest content in the AI field, focusing on developers, helping you understand technology trends and innovative AI product applications.

Fresh AI products Click to learn more:https://app.aibase.com/zh

1. Alibaba Cloud Launches the World's First All-Modal AI Model Qwen3-Omni, Achieving Unified Processing of Text, Images, Audio, and Video

Alibaba Cloud has released Qwen3-Omni, the world's first native end-to-end all-modal AI model that supports unified processing of text, images, audio, and video. The model demonstrates advanced cross-modal performance in multiple fields and is open-source, meeting the multilingual needs of global users.

AiBase Highlights:
🌟 Qwen3-Omni is the world's first native end-to-end all-modal AI model, supporting unified processing of text, images, audio, and video.
🌐 The model supports 119 text languages and 19 speech inputs, meeting the multilingual needs of global users.
🖼️ The newly released Qwen-Image-Edit-2509 supports multi-image editing, significantly improving the consistency and effectiveness of editing.
Details: https://github.com/QwenLM/Qwen3-Omni huggingface: https://huggingface.co/collections/Qwen/qwen3-omni-68d100a86cd0906843ceccbe

2. Say Goodbye to Image Editing Worries! Alibaba's Qwen-Image Multi-Image Editing Function Creates Professional Advertising Movies in One Click

The article introduces a major feature upgrade of Alibaba's AI image editing tool Qwen-Image, including new multi-image editing functions, the introduction of ControlNet keypoint map technology, and expanded application scenarios to meme creation, providing more efficient solutions for the e-commerce and digital marketing industries.

AiBase Highlights:
🖼️ New multi-image editing function supports flexible combinations of person + person, person + product, and person + scene.
⚙️ Introduced ControlNet keypoint map function, improving the accuracy of human posture control.
🛒 Expanded application scenarios, supporting meme creation, assisting the e-commerce and marketing industries.
Details: https://chat.qwen.ai/?inputFeature=image_edit

3. Baidu Launches Qianfan-VL Model, Multiple Sizes Meet Different Scenarios

Baidu Intelligent Cloud's Qianfan team launched the new visual understanding model Qianfan-VL, which includes three sizes: 3B, 8B, and 70B, and is deeply optimized for enterprise-level multimodal applications. Qianfan-VL performs well in OCR, educational scenarios, and math problem-solving, and shows excellent general capabilities and outstanding performance in specific tasks in benchmark tests.

AiBase Highlights:
🧠 Multiple size models meet different scenario needs.
📊 8B and 70B models have thinking and reasoning capabilities.
📄 Excellent performance in OCR and document understanding.
Details: https://baidubce.github.io/Qianfan-VL/

4. Google Launches AP2 Protocol, Partnering with PayPal to Open a New Era of AI Payments

Google's AP2 protocol provides a secure and reliable framework for AI payments, ensuring the legality and security of transactions through an authorization token mechanism, while collaborating with PayPal to promote innovation and application of AI in the payment field.

AiBase Highlights:
🛒 AP2 protocol provides a secure authorization mechanism for AI payments, ensuring transaction legitimacy.
🤝 Google collaborates with PayPal to promote the practical application of AI in the payment field.
🔒 Authorization token system clarifies responsibility division, enhancing transaction transparency.
Details: https://github.com/google-agentic-commerce/AP2

5. Apple Expands Image Generation Platform: Image Playground Will Introduce More Third-Party AI Models

Apple made significant updates to Image Playground in macOS Tahoe26, iPadOS26, and iOS26, introducing ChatGPT as an image generation model and planning to support more third-party models, such as Google's Gemini2.5Flash Image.

AiBase Highlights:
🍎 Apple expands Image Playground to support more third-party AI models, including OpenAI and Google's Gemini2.5Flash Image.
⚙️ Added "estimated delay" metrics and "provide trademark identifier," indicating that Apple is optimizing the model selection mechanism.
🔒 Apple may prefer to collaborate with external partners rather than directly support open-source models to ensure the security of the image generation tool.

6. One-Click Transformation into a Learning Machine! Baidu Search Launches AI Companion Learning

Baidu launches AI Companion Learning, using AI technology to transform ordinary phones into learning machines, providing students with accurate practice, oral training, and other functions, helping to promote educational equity and resource popularization.

AiBase Highlights:
📚 AI Companion Learning uses AI technology to turn ordinary phones into learning machines, enhancing educational equity.
🗣️ Provides AI speaking, essay correction, and other tools to help students with personalized learning.
🌍 Baidu Education team uses Wenshi 4.5 technology to promote the integration and popularization of educational resources.

7. DingTalk AI Table Assistant Officially Launched: One Sentence Generates Tables, Building an Enterprise-Level AI Application Platform

DingTalk launched the AI Table Assistant, upgrading the AI table into an application creation platform for the AI era. Users just need to update to the latest version to experience this new feature. The AI Table Assistant supports natural language descriptions of ideas, automatically generating tables, automated workflows, and data analysis dashboards, greatly reducing the usage threshold.

AiBase Highlights:
✨ AI Table Assistant supports natural language description of ideas, automatically generating tables, automated workflows, and data analysis dashboards.
🚀 Introducing Field Agent, adding 30 agents, supporting multi-modal AI capabilities such as AI video understanding and digital humans.
🌐 Cross-platform workflow support, adding support for workflows from platforms like BaiLian and Coze, achieving cross-platform data aggregation and analysis.

8. DeepSeek-V3.1-Terminus Launches: Performance Fully Upgraded, Deep Reasoning Capabilities Significantly Enhanced

DeepSeek released the DeepSeek-V3.1-Terminus model and made it open source. The model fixes issues such as language inconsistency and abnormal characters compared to previous versions, optimizes the performance of programming and search agents. Benchmark test data shows that its performance improved by 0.2% to 36.5%, especially excelling in high-difficulty knowledge, multimodal, and deep reasoning aspects.

AiBase Highlights:
🧠 DeepSeek-V3.1-Terminus model performance fully upgraded, improvement range 0.2%-36.5%
🚀 Optimized the performance of programming and search agents, solving the language inconsistency issue in the previous version
🔍 Outstanding performance in HLE testing, demonstrating strong deep reasoning and multimodal processing capabilities
Details: https://huggingface.co/deepseek-ai/DeepSeek-V3.1-Terminus https://modelscope.cn/models/deepseek-ai/DeepSeek-V3.1-Terminus

9. Kimi Agent Membership Launched with Surprises! Donations Become 9 Months VIP, 49 Yuan Enjoy Super Value AI Deep Research

Kimi launched a new Agent membership service, offering additional benefits to early donors, and demonstrated brand creativity through a membership system named after music rhythms. The deep research function is based on a proprietary model, providing professional insights and promoting the evolution of AI assistants towards intelligent agents.

AiBase Highlights:
✨ Kimi launched Agent membership service, donations users can get extra membership time.
🎵 Membership system is named after classical music rhythm terms, blending art and technology.
🔍 Deep research function based on proprietary model, providing multidimensional perspectives and cognitive discoveries.

10. The World's First General Embodied Intelligence Model Open Sourced! Zhiyuan Robot GO-1 Shocks the Market

Zhiyuan Robot announced that its GO-1 general embodied base model is fully open-sourced, making it the world's first embodied intelligence model using the ViLLA architecture, capable of understanding and executing complex tasks. This move will promote the application and research of embodied intelligence, reduce technical barriers, attract more developers to participate in the ecosystem, and promote cross-domain innovation and cooperation.

AiBase Highlights:
🤖 GO-1 is the world's first embodied intelligence model using the ViLLA architecture, combining visual, language, and potential action capabilities.
💡 Open-sourcing GO-1 will promote the application and research of embodied intelligence, reducing technical barriers.
🌐 Zhiyuan Robot hopes to attract more developers to participate in the embodied intelligence ecosystem, promoting cross-domain innovation and cooperation.

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Services

AI Search Visibility Checker

AI Model Compatibility Checker

AI Dataset Collection

Intelligent Document Recognition

AI Daily: Alibaba Launches Multimodal Model Qwen3-Omni; Google Unveils AP2 Protocol; Baidu Launches Qianfan-VL Model

站长之家

This article is from AIbase Daily

AI News Recommendations

Google officially announces the full rollout of Gemini in Chrome to Workspace users

OpenAI's Valuation Surges to $500 Billion, Becomes the King of Global Unicorns

OpenAI and Musk Clash Again: Insist They Don't Need or Want Anybody's Trade Secrets

ChatGPT Launches Immediately Purchase Features Make Shopping More Convenient!

Qwen3-LiveTranslate-Flash Achieves a Groundbreaking 3-Second Live Translation Delay, Setting a New Industry Record

Robot Vision Makes a Big Leap! New Model Helps AI Understand the 3D World, Success Rate Increased by 31%

Ending the Prototype Dilemma: Anything Achieves a $1 Billion Valuation, Revolutionizing App Development with Full-Stack AI Tools

Cambrian announces full compatibility of the DeepSeek-V3.2-Exp model, the inference engine is open-sourced!

Claude Code 2.0 Revolutionary Upgrade: Checkpoints + VS Code Plugin, Programming Efficiency Increases by 3 Times

AI Daily: Ant Open Sources High-Performance Thinking Model Ring-flash-2.0; Tongyi's 7 Models Dominate Hugging Face; Veo3 Visual Capabilities Upgraded

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Services​

AI Search Visibility Checker

AI Model Compatibility Checker

AI Dataset Collection

Intelligent Document Recognition

AI Daily: Alibaba Launches Multimodal Model Qwen3-Omni; Google Unveils AP2 Protocol; Baidu Launches Qianfan-VL Model

站长之家

This article is from AIbase Daily

AI News Recommendations

Google officially announces the full rollout of Gemini in Chrome to Workspace users

OpenAI's Valuation Surges to $500 Billion, Becomes the King of Global Unicorns

OpenAI and Musk Clash Again: Insist They Don't Need or Want Anybody's Trade Secrets

ChatGPT Launches Immediately Purchase Features Make Shopping More Convenient!

Qwen3-LiveTranslate-Flash Achieves a Groundbreaking 3-Second Live Translation Delay, Setting a New Industry Record

Robot Vision Makes a Big Leap! New Model Helps AI Understand the 3D World, Success Rate Increased by 31%

Ending the Prototype Dilemma: Anything Achieves a $1 Billion Valuation, Revolutionizing App Development with Full-Stack AI Tools

Cambrian announces full compatibility of the DeepSeek-V3.2-Exp model, the inference engine is open-sourced!

Claude Code 2.0 Revolutionary Upgrade: Checkpoints + VS Code Plugin, Programming Efficiency Increases by 3 Times

AI Daily: Ant Open Sources High-Performance Thinking Model Ring-flash-2.0; Tongyi's 7 Models Dominate Hugging Face; Veo3 Visual Capabilities Upgraded

GEO Services