AI Daily: Xiaomi Opensources Its First Native End-to-End Speech Large Model; Tongyi Wanxiang Wan2.2-Animate Officially Open-Sourced; Suno v5 to Launch Soon

站长之家

Published inAI News · 13 min read · Sep 19, 2025

124

Welcome to the "AI Daily" column! This is your guide to exploring the world of artificial intelligence every day. Every day, we present you with the latest content in the AI field, focusing on developers, helping you understand technology trends and innovative AI product applications.

New AI products click to learn more:https://app.aibase.com/zh

1. Xiaomi Open Sources Its First Native End-to-End Speech Large Model Xiaomi-MiMo-Audio

Xiaomi announced the open source of its first native end-to-end speech large model Xiaomi-MiMo-Audio, marking a significant breakthrough in the field of speech technology. The model is based on an innovative pre-training architecture and billions of hours of training data, showing excellent few-shot generalization capabilities and outperforming other closed-source models on multiple evaluation benchmarks.

【AiBase Highlights:】
🧠 For the first time, achieved few-shot generalization capabilities based on In-Context Learning in the speech field.
🚀 Exceeded closed-source models from Google and OpenAI on audio understanding benchmarks MMAU and Big Bench Audio S2T tasks.
🔧 Opened the complete speech pre-training solution, including Tokenizer, model structure, training methods, and evaluation system.
More details: https://huggingface.co/XiaomiMiMo/MiMo-Audio-7B-Instruct

2. Tongyi Wanxiang's New Action Generation Model Wan2.2-Animate Officially Open-Sourced

The new action generation model Wan2.2-Animate, introduced by the Tongyi Wanxiang team, has significant improvements in character consistency and generation quality. It supports two modes: action imitation and role-playing, and is widely applied in short video creation and animation production.

【AiBase Highlights:】
🎭 Input a character image and reference video, the model can transfer the video actions to the image character.
🎭 In the role-playing mode, the model can replace the character in the video with the image character.
🖼️ The model designed an independent lighting fusion LoRA to ensure perfect lighting effects.
More details: https://github.com/Wan-Video/Wan2.2

3. Suno v5 Music Model is About to Launch, Bringing a "Transformative" Upgrade to AI Music Creation

Suno's v5 music model is about to be released, seen as a milestone in AI music creation, expected to further blur the line between human composition and machine-generated music.

【AiBase Highlights:】
🎧 Suno v5 music model is about to launch, attracting global attention.
💡 v5 will introduce more advanced semantic control and multimodal input features.
📈 After the release of v4.5, the number of plays of user-generated works exceeded hundreds of millions.

4. Shengshu Technology Secures Billions in Funding, Leading the Trend of AI Commercialization through Video Generation

Shengshu Technology has made significant progress in the multimodal AI field, successfully securing billions in funding and achieving commercial success through the Vidu video large model. In the future, video generation technology is expected to further develop and impact multiple industries, but it also needs to address issues such as copyright and false information.

【AiBase Highlights:】
🎥 Shengshu Technology completed a round of A-series funding worth billions, marking a breakthrough in the multimodal AI field.
💼 The Vidu video large model has achieved $20 million in annual revenue, with widespread commercial applications.
🌐 Video generation technology will change the way digital content is produced globally, facing challenges such as copyright governance.

5. OpenAI Fixes ChatGPT Vulnerability, Preventing Theft of User Gmail Data

The article points out that the cybersecurity company Radware discovered a serious vulnerability in the "Deep Research" feature of ChatGPT, which could be exploited by hackers to steal users' Gmail email data. The vulnerability allows hackers to trick ChatGPT into sending sensitive information to malicious websites when processing user Gmail queries. OpenAI has quickly fixed this vulnerability and emphasized that the security of the model is its top priority.

【AiBase Highlights:】
📧 The ChatGPT vulnerability allows hackers to steal user Gmail data through specially crafted emails.
🔒 OpenAI quickly fixed the vulnerability and confirmed its commitment to user information security.
🛡️ Conventional security measures are difficult to detect such attacks, and users need to remain vigilant.

6. Google Integrates Gemini into Chrome Browser, Enhancing Intelligent Search Experience

Google integrated Gemini into the Chrome browser to enhance user experience and cope with competitive pressure. Gemini supports cross-tab work and task scheduling, and is deeply integrated with multiple Google applications. Enterprise users will also benefit from data protection and proxy functions.

【AiBase Highlights:】
🌐 Google integrates Gemini into Chrome, enhancing the user's intelligent search experience.
📅 Gemini supports users in understanding web content, working across tabs, and arranging tasks.
🔒 Enterprise users will also enjoy data protection and proxy functions provided by Gemini.

7. Luma AI Launches Ray3: Revolutionizing Video Generation with "Reasoning" Capabilities, Supporting 16-bit Color Depth

Luma AI's Ray3 video generation model has brought revolutionary changes to video creation with its HDR capabilities and powerful "reasoning" functions, while supporting high-precision visual control and professional workflow integration.

【AiBase Highlights:】
🎥 Ray3 supports the generation of videos with 10-bit, 12-bit, and even 16-bit color depth, and can be exported in EXR file format for use in professional workflows.
🧠 Ray3 has "reasoning" capabilities, enabling it to understand complex instructions and self-assess output quality, achieving video iteration and optimization.
🖌️ Users can control video content by drawing sketches, providing unprecedented creative freedom.

8. French AI Company Mistral Launches Open-Source Reasoning Model Magistral Small 1.2

French company Mistral AI launched its latest open-source reasoning model Magistral Small 1.2, which has 24B parameters and is released under the Apache2.0 open-source license. The new version supports up to 128k context processing, introduces [THINK] special tokens, enhances the model's expressiveness and flexibility. At the same time, Magistral Small 1.2 also adds a visual encoder, compatible with various frameworks, providing developers with more convenience.

【AiBase Highlights:】
🧠 Magistral Small 1.2 is an open-source reasoning model with 24B parameters, released under the Apache2.0 license.
🔍 The new version introduces [THINK] special tokens, enhancing the model's expressiveness and flexibility.
🖼️ Added a visual encoder, making it more advantageous in image and text integration tasks.

9. Notion Launches Its First AI Agent! Automatically Generate Meeting Notes, Competitor Analysis, Process Hundreds of Pages of Documents in 20 Minutes

Notion launched its first AI agent, which can automatically generate meeting notes, analysis reports, competitor evaluations, etc., using all of the user's Notion pages and databases as context. The AI agent is powerful, capable of creating or updating pages and databases, and supports triggering operations from external platforms. Personalization is its highlight, allowing users to set up an archive page for the AI agent to guide how it cites sources and the output style.

【AiBase Highlights:】
🧠 The AI agent can automatically generate meeting notes, analysis reports, and competitor evaluations.
🔄 Supports triggering AI agent operations from external platforms (such as Slack, emails, and Google Drive).
📝 Users can customize the AI agent's archive page to guide its behavior and output style.

10. Tencent Hunyuan 3D Studio Makes a Stunning Debut: 3D Creation Speeds Up from Days to Minutes

The release of Tencent Hunyuan 3D Studio marks a revolutionary improvement in 3D creation efficiency, providing a powerful AI workstation for designers, game developers, and modelers, significantly shortening the 3D asset production cycle.

【AiBase Highlights:】
🧠 Native 3D segmentation algorithm achieves automatic splitting of model parts, supporting independent editing of character accessories and clothing.
🎨 AI semantic UV unfolding technology generates UV maps that meet artistic standards in 1-2 minutes, improving work efficiency.
🔧 Intelligent material editing supports generating high-quality PBR texture materials through text or image input, achieving precise material control.
More details: https://3d.hunyuan.tencent.com/studio

AI Daily: DouBao Audio Generation Model 1.0 Released; WeCom Tests AI Agent Big Round; Cursor Releases Fully Self-Trained Large Model

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present the latest content in the AI field, focusing on developers to help you understand technology trends and innovative AI product applications. Click to learn more about new AI products: https://app.aibase.com/zh1, The release of DouBao's Audio Generation Model 1.0 marks the beginning of the 'audio director' era. Through multimodal reference generation and long-term voice consistency technology, the DouBao Audio Generation Model 1.0 revolutionizes traditional audio production processes.

Volcano Engine Launches Doubao Audio Generation Model 1.0: Generate Cinematic-Quality Audio with One Sentence, Character Voices Stay Consistent for 10 Minutes

Volcano Engine launches Doubao Audio Generation Model 1.0, which supports text or audio input and generates complete audio works end-to-end. The core breakthrough is that a single Prompt can simultaneously generate dialogue, sound effects, and background music, eliminating the need for traditional multi-track editing. This technology greatly simplifies the audio production process, allowing users to efficiently produce professional-quality audio like an 'audio director,' completely eliminating the complex post-production work of manual alignment and mixing.

Doubao Audio Generation Model 1.0 Launched, Opening the Era of Audio Direction

Volcano Engine launched Doubao Audio Generation Model 1.0, featuring two core technologies: "Multimodal Reference Generation" and "Long-term Voice Consistency," simplifying traditional audio post-production processes. It enables one-stop generation of dialogue, sound effects, and background music, improving creative efficiency.

Xiaomi Launches MiMo Claw Final Version: Supports 1000 Consecutive Tool Calls, Free Duration Increased to 4 Hours

Xiaomi launches the final version of its cloud-based lightweight Agent product, MiMo Claw, based on the OpenClaw framework, equipped with the MiMo-V2.5-Pro flagship model. It has integrated with the Kingsoft Office ecosystem, adding document office and tool call capabilities, and upgrading free benefits and subscription system. The model supports the MCP protocol and long context, making it suitable for complex multi-step tasks.

China's Multimodal Large Model Reaches a Milestone: MiniMax M3 Is Officially Open-Sourced and Response Speed Doubles

Xiyu Technology open-sourced its native multimodal flagship model MiniMax M3, with 428B total parameters and 23B activated parameters, the first of its kind in the industry. It previously released weights and a sparse attention mechanism paper, sparking widespread attention. The model ranks first in comprehensive performance among open-source models.....

AI Daily: Xiaomi Opensources AI Coding Assistant MiMo Code; JD MALL's First Humanoid Robots Start Work; Google Releases DiffusionGemma

Welcome to the [AI Daily] column! This is your guide to exploring the world of artificial intelligence every day. Every day, we present you with the latest content in the AI field, focusing on developers, helping you understand technical trends and innovative AI product applications. Click to learn more about new AI products: https://app.aibase.com/zh1, Xiaomi opensources the terminal AI coding assistant MiMoCode, which includes a free top multi-modal model. Xiaomi has opened the source code for the terminal AI coding assistant MiMoCode V0.1.0, and this project is based on Op

Xiaomi Open-Sources Terminal AI Coding Assistant MiMo Code with Free Top-Grade Multimodal Model Built-In

Xiaomi's tech team open-sourced AI coding assistant MiMo Code V0.1.0, based on OpenCode with MIT license, enabling free modification and commercial integration. It features 'model-agent synergy optimization' and a unique persistent memory system to address AI coding tools' forgetfulness, aiming for self-evolution.....

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

GEO Brand Visibility

AI Visibility Audit

AI Search Visibility Checker

GEO Ranking Monitor

AI Conversation Insight

GEO Promotion Link Detection

GEO Ranking Optimization System

GEO Ranking Optimization

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

LLM API Hub

AI Models Finder

Model Providers

LLM Leaderboard

LLM API Proxy Checker

Compare LLMs

LLM Cost Calculator

LLM Arena

AI Model Compatibility Checker

AI Deployment Calculator

AI Daily: Xiaomi Opensources Its First Native End-to-End Speech Large Model; Tongyi Wanxiang Wan2.2-Animate Officially Open-Sourced; Suno v5 to Launch Soon

站长之家

This article is from AIbase Daily

AI News Recommendations

AI Daily: DouBao Audio Generation Model 1.0 Released; WeCom Tests AI Agent Big Round; Cursor Releases Fully Self-Trained Large Model

Volcano Engine Launches Doubao Audio Generation Model 1.0: Generate Cinematic-Quality Audio with One Sentence, Character Voices Stay Consistent for 10 Minutes

Doubao Audio Generation Model 1.0 Launched, Opening the Era of Audio Direction

Xiaomi Open-Source Home Intelligence AI Solution Miloco 2.0: Making the Home Remember, Recognize People, and Understand Execution

AI Daily: WeChat Pay Launches AI Dedicated Card; Xiaomi MiMo Claw Final Version Released; Zhipei AI Officially Opens Source GLM-5.2 Model

Xiaomi Launches MiMo Claw Final Version: Supports 1000 Consecutive Tool Calls, Free Duration Increased to 4 Hours

China's Multimodal Large Model Reaches a Milestone: MiniMax M3 Is Officially Open-Sourced and Response Speed Doubles

AI Daily: Xiaomi Opensources AI Coding Assistant MiMo Code; JD MALL's First Humanoid Robots Start Work; Google Releases DiffusionGemma

Xiaomi Technology Team Advances in AI Programming: MiMo Code Officially Open-Source

Xiaomi Open-Sources Terminal AI Coding Assistant MiMo Code with Free Top-Grade Multimodal Model Built-In

AI News Recommendations

AI Daily: DouBao Audio Generation Model 1.0 Released; WeCom Tests AI Agent Big Round; Cursor Releases Fully Self-Trained Large Model

Volcano Engine Launches Doubao Audio Generation Model 1.0: Generate Cinematic-Quality Audio with One Sentence, Character Voices Stay Consistent for 10 Minutes

Doubao Audio Generation Model 1.0 Launched, Opening the Era of Audio Direction

Xiaomi Open-Source Home Intelligence AI Solution Miloco 2.0: Making the Home Remember, Recognize People, and Understand Execution

AI Daily: WeChat Pay Launches AI Dedicated Card; Xiaomi MiMo Claw Final Version Released; Zhipei AI Officially Opens Source GLM-5.2 Model

Xiaomi Launches MiMo Claw Final Version: Supports 1000 Consecutive Tool Calls, Free Duration Increased to 4 Hours

China's Multimodal Large Model Reaches a Milestone: MiniMax M3 Is Officially Open-Sourced and Response Speed Doubles

AI Daily: Xiaomi Opensources AI Coding Assistant MiMo Code; JD MALL's First Humanoid Robots Start Work; Google Releases DiffusionGemma

Xiaomi Technology Team Advances in AI Programming: MiMo Code Officially Open-Source

Xiaomi Open-Sources Terminal AI Coding Assistant MiMo Code with Free Top-Grade Multimodal Model Built-In