Welcome to the "AI Daily" section! This is your guide to exploring the world of artificial intelligence every day. Every day, we present you with the latest content in the AI field, focusing on developers to help you understand technical trends and learn about innovative AI product applications.

Fresh AI products click to learn more:https://app.aibase.com/zh

1. StepZen releases the end-to-end speech large model Step-Audio 2 mini

StepZen released the strongest open-source end-to-end speech large model Step-Audio 2 mini, achieving SOTA results in multiple international benchmark tests, demonstrating outstanding audio understanding, speech recognition, cross-lingual translation, and dialogue capabilities. The model uses an innovative architecture design, breaking through the traditional ASR+LLM+TTS three-tier structure, enabling direct conversion from original audio input to voice response output, and introducing chain-of-thought reasoning and reinforcement learning joint optimization to enhance understanding of paralanguage information and natural response capabilities.

image.png

AiBase Highlights:

🔥 Step-Audio2mini achieved SOTA results in multiple international benchmark tests, outperforming open-source models like Qwen-Omni and Kimi-Audio.

🧠 The model adopts a true end-to-end multimodal architecture, breaking through the traditional ASR+LLM+TTS three-tier structure, achieving more concise and low-latency audio processing.

💡 Introducing chain-of-thought reasoning and reinforcement learning joint optimization to enhance understanding and natural response capabilities for paralanguage information such as emotions, tone, and music.

More details: https://github.com/stepfun-ai/Step-Audio2

2. AI Content Regulations Effective September 1st! Non-Identification Is Illegal, 34 Million Content Creators Respond Urgently

The "Artificial Intelligence Generated Synthetic Content Identification Method" will be implemented compulsorily from September 1st, marking a new stage of institutionalization and standardization in China's AI content governance. The new regulations require all AI-generated content to be explicitly and implicitly identified to improve information transparency and prevent the spread of false information.

image.png

AiBase Highlights:

✅ Explicit identification requires AI-generated content to be clearly marked in text, images, videos, and audio, breaking the invisibility of AI content.

🔍 Implicit identification embeds digital fingerprints into metadata to achieve content traceability and enhanced regulatory capabilities.

⚖️ Severe consequences for violations include traffic restrictions, rectification, removal, and legal risks, promoting the standardized development of the AI industry.

3. Meituan Launches Open-Source Large Model LongCat: Aimed at Empowering Developers, Accelerating AI Application Deployment

Meituan's open-source large model LongCat has strong technical capabilities, achieving efficient computing performance through an innovative mixture-of-experts architecture and showing excellent performance in multiple benchmark tests, providing developers with powerful tools.

image.png

AiBase Highlights:

🧠 LongCat-Flash has 56 billion parameters, using a mixture-of-experts (MoE) architecture, dynamically activating parts of the parameters to optimize computational efficiency.

🚀 Supports over 100 tokens per second inference processing capability, featuring low latency and high scalability.

📊 Demonstrates excellent performance in tasks such as MMLU and mathematical reasoning, showcasing its potential in practical applications.

More details: https://longcat.chat/

4. Shanghai AI Lab Releases Multimodal Large Model ShuSheng·WanXiang InternVL3.5

Shanghai AI Lab released the multimodal large model InternVL3.5, achieving comprehensive upgrades in reasoning ability, deployment efficiency, and general capabilities through innovative cascaded reinforcement learning, dynamic visual resolution routing, and decoupled deployment architecture. The model performed excellently in multiple benchmark tests, surpassing mainstream models such as GPT-5 and Claude-3.7-Sonnet.

image.png

AiBase Highlights:

✨ InternVL3.5 adopts a cascaded reinforcement learning framework, significantly improving reasoning performance.

🖼️ The model supports multiple visual resolutions and optimizes response speed.

🚀 Offers various parameter scale models to meet different resource demand scenarios.

More details: https://github.com/OpenGVLab/InternVL

5. Tencent ARC Opens Audio Model AudioStory: Using Large Language Models to Generate Long Audio

Tencent ARC's AudioStory model combines large language models and audio generation technology to generate structured and temporally consistent long narrative audio. The model demonstrates excellent instruction following ability and audio quality, suitable for various scenarios such as video dubbing and long audio generation.

image.png

AiBase Highlights:

🎧 AudioStory is a long narrative audio generation model based on large language models, capable of handling various audio tasks.

📊 The model has strong instruction following ability, generating coherent audio narratives to enhance user experience.

🛠️ The team has released inference code and demonstrated multiple application cases, showcasing its advantages in video dubbing and long audio generation.

More details: https://github.com/TencentARC/AudioStory

6. OpenAI Shockingly Launches GPT-realtime! Voice AI Revolution Has Come, Human-Machine Dialogue Is Hard to Distinguish

OpenAI's GPT-realtime voice model has made significant breakthroughs in natural fluency and emotional expression, accurately simulating human tone, emotional fluctuations, and speech rate changes. The model not only has multimodal processing capabilities but can also adjust voice styles in real-time to adapt to different scenario needs, bringing a revolutionary change to AI voice interaction.

image.png

AiBase Highlights:

🚀 GPT-realtime achieves an unprecedented natural voice interaction experience, accurately restoring human voice details.

🧠 The model has multimodal processing capabilities, combining image and voice information for comprehensive analysis and response.

💡 Supports switching between multiple voice styles, meeting personalized voice interaction needs in different scenarios.

7. Meta and UCSD Launch DeepConf: AI Inference Accuracy Reaches 99.9%, Computing Cost Reduced by 85%

Meta and the University of California, San Diego (UCSD) jointly launched the DeepConf technology, achieving 99.9% accuracy in high-difficulty reasoning tasks and reducing computing resource consumption by 84.7%. The technology introduces a "confidence" mechanism, allowing AI to dynamically adjust problem-solving strategies, thereby improving reasoning efficiency and accuracy.

image.png

AiBase Highlights:

🔍 DeepConf technology achieves 99.9% accuracy in high-difficulty reasoning tasks.

💡 Computing resource consumption is reduced by 84.7%, greatly lowering operational costs.

🚀 Through the "confidence" mechanism, AI can dynamically adjust problem-solving strategies, improving reasoning efficiency.

More details: https://arxiv.org/abs/2508.15260

8. Musk Admits xAI Codebase Was Stolen, Former Employee Joins OpenAI!

Musk admitted that the xAI codebase was stolen, and former employee Xuechen Li was charged with stealing trade secrets and joining OpenAI, triggering widespread attention in the tech industry.

image.png

AiBase Highlights:

💻 Former employee Xuechen Li was accused of stealing xAI's trade secrets and joining OpenAI.

🔒 xAI requested the court to prohibit Li from working at OpenAI and return the stolen data.

🚀 Li cashed out nearly $7 million before leaving, possibly saving OpenAI hundreds of millions in R&D costs.

9. Alibaba Qwen Team Releases Next-Generation GUI Automation Framework Mobile-Agent-v3 and GUI-Owl

Alibaba Qwen team launched two revolutionary products — Mobile-Agent-v3 and GUI-Owl, aimed at solving challenges in graphical user interface (GUI) automation. These tools enhance task understanding and execution capabilities through multimodal models and multi-agent collaboration, demonstrating strong cross-platform task completion capabilities, marking a significant advancement for Alibaba in the field of general GUI automation.

image.png

AiBase Highlights:

🧠 GUI-Owl is a multimodal agent model introduced by Alibaba, integrating perception, reasoning, and execution capabilities, adapting to complex GUI environments.

🤖 Mobile-Agent-v3 framework realizes multi-agent collaboration, improving task execution efficiency through dynamic plan updates.

📊 These two products perform well in GUI automation benchmark tests, marking a significant breakthrough for Alibaba in the automation field.

More details: https://arxiv.org/abs/2508.15144

10. Microsoft Launches Copilot Labs, First Experimental Tool "Copilot Audio Expression" Goes Live

Microsoft launched a new experimental AI center, Copilot Labs, aiming to invite users to participate in AI innovation and development. Its first tool is "Copilot Audio Expression," which can convert written text into natural and fluent voice narration and support emotional and storytelling modes, giving users high control.

image.png

AiBase Highlights:

🌟 Copilot Labs is a platform inviting users to participate in AI innovation, marking Microsoft's further exploration in the AI field.

🔊 "Copilot Audio Expression" is the first experimental tool, converting text into natural voice and supporting emotional and storytelling modes.

🌐 This tool is freely available worldwide, but some functions require logging in with a Microsoft account and having a Copilot Pro subscription.

More details: https://copilot.microsoft.com/labs/experiments/audio-expression

11. Xiaohongshu Automation Tool xiaohongshu-mcp Goes Live! AI Helps Content Creation, Freeing Your Hands!

The article introduces xiaohongshu-mcp, an open-source tool based on the MCP protocol, capable of achieving automated login, content publishing, and data acquisition on the Xiaohongshu platform. By integrating with AI clients, it simplifies the operation process and has good scalability, suitable for content creators and developers to use.

image.png

AiBase Highlights:

🔐 Smart login, persistent experience: After the first scan login, subsequent operations do not need to log in again.

🖼️ Image and text publishing leads, future functions expected: Currently supports automated publishing of image and text content, with plans to expand video publishing and data analysis functions in the future.

🛠️ Developer-friendly, open ecosystem: Developed in Go language, with clear code structure, easy for secondary development, supports GitHub cloning deployment.

More details: https://github.com/xpzouying/xiaohongshu-mcp