Welcome to the "AI Daily" column! This is your guide to exploring the world of artificial intelligence every day. Every day, we present you with the latest content in the AI field, focusing on developers, helping you understand technology trends and innovative AI product applications.
New AI products click to learn more:https://app.aibase.com/zh
1. Xiaomi Open Sources Its First Native End-to-End Speech Large Model Xiaomi-MiMo-Audio
Xiaomi announced the open source of its first native end-to-end speech large model Xiaomi-MiMo-Audio, marking a significant breakthrough in the field of speech technology. The model is based on an innovative pre-training architecture and billions of hours of training data, showing excellent few-shot generalization capabilities and outperforming other closed-source models on multiple evaluation benchmarks.
【AiBase Highlights:】
🧠 For the first time, achieved few-shot generalization capabilities based on In-Context Learning in the speech field.
🚀 Exceeded closed-source models from Google and OpenAI on audio understanding benchmarks MMAU and Big Bench Audio S2T tasks.
🔧 Opened the complete speech pre-training solution, including Tokenizer, model structure, training methods, and evaluation system.
More details: https://huggingface.co/XiaomiMiMo/MiMo-Audio-7B-Instruct
2. Tongyi Wanxiang's New Action Generation Model Wan2.2-Animate Officially Open-Sourced
The new action generation model Wan2.2-Animate, introduced by the Tongyi Wanxiang team, has significant improvements in character consistency and generation quality. It supports two modes: action imitation and role-playing, and is widely applied in short video creation and animation production.
【AiBase Highlights:】
🎭 Input a character image and reference video, the model can transfer the video actions to the image character.
🎭 In the role-playing mode, the model can replace the character in the video with the image character.
🖼️ The model designed an independent lighting fusion LoRA to ensure perfect lighting effects.
More details: https://github.com/Wan-Video/Wan2.2
3. Suno v5 Music Model is About to Launch, Bringing a "Transformative" Upgrade to AI Music Creation
Suno's v5 music model is about to be released, seen as a milestone in AI music creation, expected to further blur the line between human composition and machine-generated music.
【AiBase Highlights:】
🎧 Suno v5 music model is about to launch, attracting global attention.
💡 v5 will introduce more advanced semantic control and multimodal input features.
📈 After the release of v4.5, the number of plays of user-generated works exceeded hundreds of millions.
4. Shengshu Technology Secures Billions in Funding, Leading the Trend of AI Commercialization through Video Generation
Shengshu Technology has made significant progress in the multimodal AI field, successfully securing billions in funding and achieving commercial success through the Vidu video large model. In the future, video generation technology is expected to further develop and impact multiple industries, but it also needs to address issues such as copyright and false information.
【AiBase Highlights:】
🎥 Shengshu Technology completed a round of A-series funding worth billions, marking a breakthrough in the multimodal AI field.
💼 The Vidu video large model has achieved $20 million in annual revenue, with widespread commercial applications.
🌐 Video generation technology will change the way digital content is produced globally, facing challenges such as copyright governance.
5. OpenAI Fixes ChatGPT Vulnerability, Preventing Theft of User Gmail Data
The article points out that the cybersecurity company Radware discovered a serious vulnerability in the "Deep Research" feature of ChatGPT, which could be exploited by hackers to steal users' Gmail email data. The vulnerability allows hackers to trick ChatGPT into sending sensitive information to malicious websites when processing user Gmail queries. OpenAI has quickly fixed this vulnerability and emphasized that the security of the model is its top priority.
【AiBase Highlights:】
📧 The ChatGPT vulnerability allows hackers to steal user Gmail data through specially crafted emails.
🔒 OpenAI quickly fixed the vulnerability and confirmed its commitment to user information security.
🛡️ Conventional security measures are difficult to detect such attacks, and users need to remain vigilant.
6. Google Integrates Gemini into Chrome Browser, Enhancing Intelligent Search Experience
Google integrated Gemini into the Chrome browser to enhance user experience and cope with competitive pressure. Gemini supports cross-tab work and task scheduling, and is deeply integrated with multiple Google applications. Enterprise users will also benefit from data protection and proxy functions.
【AiBase Highlights:】
🌐 Google integrates Gemini into Chrome, enhancing the user's intelligent search experience.
📅 Gemini supports users in understanding web content, working across tabs, and arranging tasks.
🔒 Enterprise users will also enjoy data protection and proxy functions provided by Gemini.
7. Luma AI Launches Ray3: Revolutionizing Video Generation with "Reasoning" Capabilities, Supporting 16-bit Color Depth
Luma AI's Ray3 video generation model has brought revolutionary changes to video creation with its HDR capabilities and powerful "reasoning" functions, while supporting high-precision visual control and professional workflow integration.
【AiBase Highlights:】
🎥 Ray3 supports the generation of videos with 10-bit, 12-bit, and even 16-bit color depth, and can be exported in EXR file format for use in professional workflows.
🧠 Ray3 has "reasoning" capabilities, enabling it to understand complex instructions and self-assess output quality, achieving video iteration and optimization.
🖌️ Users can control video content by drawing sketches, providing unprecedented creative freedom.
8. French AI Company Mistral Launches Open-Source Reasoning Model Magistral Small 1.2
French company Mistral AI launched its latest open-source reasoning model Magistral Small 1.2, which has 24B parameters and is released under the Apache2.0 open-source license. The new version supports up to 128k context processing, introduces [THINK] special tokens, enhances the model's expressiveness and flexibility. At the same time, Magistral Small 1.2 also adds a visual encoder, compatible with various frameworks, providing developers with more convenience.
【AiBase Highlights:】
🧠 Magistral Small 1.2 is an open-source reasoning model with 24B parameters, released under the Apache2.0 license.
🔍 The new version introduces [THINK] special tokens, enhancing the model's expressiveness and flexibility.
🖼️ Added a visual encoder, making it more advantageous in image and text integration tasks.
9. Notion Launches Its First AI Agent! Automatically Generate Meeting Notes, Competitor Analysis, Process Hundreds of Pages of Documents in 20 Minutes
Notion launched its first AI agent, which can automatically generate meeting notes, analysis reports, competitor evaluations, etc., using all of the user's Notion pages and databases as context. The AI agent is powerful, capable of creating or updating pages and databases, and supports triggering operations from external platforms. Personalization is its highlight, allowing users to set up an archive page for the AI agent to guide how it cites sources and the output style.
【AiBase Highlights:】
🧠 The AI agent can automatically generate meeting notes, analysis reports, and competitor evaluations.
🔄 Supports triggering AI agent operations from external platforms (such as Slack, emails, and Google Drive).
📝 Users can customize the AI agent's archive page to guide its behavior and output style.
10. Tencent Hunyuan 3D Studio Makes a Stunning Debut: 3D Creation Speeds Up from Days to Minutes
The release of Tencent Hunyuan 3D Studio marks a revolutionary improvement in 3D creation efficiency, providing a powerful AI workstation for designers, game developers, and modelers, significantly shortening the 3D asset production cycle.
【AiBase Highlights:】
🧠 Native 3D segmentation algorithm achieves automatic splitting of model parts, supporting independent editing of character accessories and clothing.
🎨 AI semantic UV unfolding technology generates UV maps that meet artistic standards in 1-2 minutes, improving work efficiency.
🔧 Intelligent material editing supports generating high-quality PBR texture materials through text or image input, achieving precise material control.
More details: https://3d.hunyuan.tencent.com/studio