AI Daily: Alibaba Launches New Qwen3-4B Model; Xiaohongshu Releases Open-Source Model Dots.vlm1; MiniMax Speech 2.5 Voice Generation Model Goes Live

站长之家

Published inAI News · 14 min read · Aug 7, 2025

Welcome to the "AI Daily" section! Here is your guide to exploring the world of artificial intelligence every day. Every day, we present you with the latest content in the AI field, focusing on developers to help you understand technology trends and learn about innovative AI product applications.

Hot AI products Click for more:https://top.aibase.com/

1. Alibaba launches new Qwen3-4B model: compact and powerful, runs AI on phones!

The Qwen3-4B series model launched by Alibaba's Tongyi Qianwen team has made important breakthroughs in the field of small language models, providing new technical paths for mobile AI applications. The model not only performs well but also has efficient resource utilization capabilities, meeting the needs of practical application scenarios.

AiBase Summary:
🧠 The Qwen3-4B series model achieves a balance between performance and size, suitable for running on mobile devices.
📊 Qwen3-4B-Instruct-2507 outperforms the closed-source small model GPT-4.1-nano and approaches the capabilities of the large-scale model Qwen3-30B-A3B.
🧮 Qwen3-4B-Thinking-2507 scores high in mathematical reasoning evaluations, demonstrating strong logical reasoning ability.

2. Xiaohongshu releases open-source multimodal large model dots.vlm1, leading the industry with NaViT visual encoder

Xiaohongshu Hi Lab released the open-source multimodal large model dots.vlm1, which is based on the NaViT visual encoder and DeepSeek V3 large language model, showing outstanding performance, especially in chart reasoning and STEM math reasoning, marking a new height for open-source multimodal models.

AiBase Summary:
🧠 Native self-developed NaViT visual encoder supports dynamic resolution, enhancing generalization ability.
📊 Built a large-scale, finely cleaned training set, improving image-text alignment quality.
🚀 Performs excellently in multimodal evaluations, approaching the closed-source models Gemini2.5Pro and Seed-VL1.5.

3. MiniMax Speech 2.5 voice generation model launched: stronger multilingual expressiveness

MiniMax launched the next-generation speech generation model Speech2.5, achieving significant improvements in multilingual expressiveness, voice replication, and language coverage. The model maintains the world's strongest level in Chinese while significantly improving the performance of English and other multilingual languages, bringing convenience and innovation opportunities to multiple industries.

AiBase Summary:
🧠 Speech2.5 achieved remarkable progress in multilingual expressiveness, supporting 40 languages.
🎙️ Voice replication reaches industry-leading precision, preserving regional accents.
🌐 Multilingual coverage expanded to 40 languages, including several new languages, aiding global content creation.

4. Midjourney introduces HD video mode, tailored for professionals to create high-quality videos

Midjourney introduced a new HD video mode, offering professional users higher resolution and better quality video generation tools. This mode significantly improves resolution and clarity, but the cost also increases accordingly. This feature further strengthens Midjourney's competitiveness in the AI video generation field.

AiBase Summary:
🎥 HD video mode provides higher pixel resolution, meeting the demand for high-quality images from professional users.
💰 HD mode costs approximately 3.2 times that of SD mode, but offers better visual effects.
🚀 Midjourney continuously optimizes its technology, competing fiercely with competitors such as OpenAI's Sora and Runway's Gen-4.

5. Cursor1.4 officially released: focused on asynchronous long-term tasks, accelerating automation of large codebases

The release of Cursor1.4 marks its further leadership in the field of AI-driven development tools. This version enhances asynchronous and long-term task processing capabilities, optimizes indexing and search functions for large codebases, and promotes the transition of AI coding tools toward full automation.

AiBase Summary:
🚀 Asynchronous task processing capabilities have been significantly improved, supporting background Agent operation and task queue management.
🔍 Precisely optimized for large codebases, improving code completion and query efficiency.
🔄 Promotes the transition of AI coding tools toward full automation, enhancing Agent autonomy and collaboration features.
Details link: https://cursor.com/en/changelog

6. Google denies AI search function affecting website traffic, but data shows a surge in zero-click searches

Google refuted claims that its AI search function has impacted website traffic, stating that natural click-through rates remain stable and the quality of clicks has improved. However, data shows a significant increase in the proportion of zero-click searches, indicating a shift in user behavior.

AiBase Summary:
🟢 Google claims that the AI search function has not significantly affected website traffic, but the proportion of zero-click searches has increased.
🟡 Google emphasizes that the quality of clicks has improved, but has not provided specific data to support its conclusion.
🔴 User trends are shifting to other platforms, such as Reddit and TikTok, causing changes in Google's traffic.

7. MiniCPM-V4.0 open-sourced, hailed as "GPT-4V on your phone"

MiniCPM-V4.0, a lightweight multimodal large model, demonstrates excellent performance and optimized design, excelling in tasks such as image, video understanding, and multi-turn dialogue. Its efficient operation on mobile devices opens up new possibilities for AI applications.

AiBase Summary:
🔥 MiniCPM-V4.0 is built on SigLIP2-400M and MiniCPM4-3B, with only 4.1B parameters, yet it demonstrates strong image and video understanding capabilities.
🚀 Tested on iPhone16Pro Max, the first response delay is less than 2 seconds, decoding speed exceeds 17 token/second, and it has high concurrency processing capability.
🌐 Provides rich ecosystem support, compatible with mainstream frameworks, and offers iOS apps and detailed tutorials, lowering the barrier for developers.
Details link: https://github.com/OpenBMB/MiniCPM-o

8. AMD and Qualcomm announce hardware support for gpt-oss series open models

AMD and Qualcomm jointly announced support for OpenAI's gpt-oss series models, marking an important advancement in the integration of edge computing and AI. Ryzen AI Max+395 processor becomes the first consumer-grade AI PC processor to run gpt-oss-120b, while Qualcomm Snapdragon platform demonstrates the excellent reasoning capabilities of gpt-oss-20b.

AiBase Summary:
🧠 AMD and Qualcomm announced support for OpenAI's gpt-oss series models, promoting the integration of edge computing and AI.
🚀 Ryzen AI Max+395 processor becomes the first consumer-grade AI PC processor in the world to run gpt-oss-120b.
📱 Qualcomm Snapdragon platform demonstrates the excellent reasoning capabilities of gpt-oss-20b, allowing developers to easily access the model.

9. Mianbi Intelligent's new multimodal model MiniCPM-V 4.0 open-sourced

Mianbi Intelligent's MiniCPM-V4.0 multimodal model has achieved significant improvements in parameter count and performance. It not only achieves state-of-the-art results in multiple evaluation benchmarks but also runs stably on mobile devices like smartphones. Its unique model structure design enables faster initial response time and lower VRAM usage, and it also opens source deployment tools to help developers achieve lightweight deployment.

AiBase Summary:
✨ MiniCPM-V4.0 achieves significant improvements in multimodal capabilities with 4B parameters, reaching state-of-the-art levels in its category.
📱 Runs stably and smoothly on mobile devices, suitable for local deployment and real-time tasks.
🚀 Optimized model structure brings faster initial response time and lower VRAM usage, improving overall performance.
Details link: https://github.com/OpenBMB/MiniCPM-o

10. Tencent opens sources WeKnora! Unlock complex document intelligent analysis, knowledge management enters the AI era

Tencent's open-sourced WeKnora is a document understanding and retrieval tool based on large language models, capable of processing multimodal documents and providing efficient structured content extraction and intelligent interaction functions. Its modular design and strong semantic processing capabilities bring technological innovations to multiple industries.

AiBase Summary:
🧠 WeKnora supports multimodal document parsing, extracting structured content from formats such as PDF, Word, and images.
💬 Based on large language models, it provides intelligent interaction functions, supporting multi-turn dialogues and natural language queries.
📦 Modular architecture design makes it easy to flexibly configure and expand, adapting to different industry needs.
Details link: https://github.com/Tencent/WeKnora

11. Big news! Detailed information about OpenAI's flagship model GPT-5 seems to have been leaked in advance on GitHub

The article reveals the performance leap of GPT-5, its multi-version layout, and its potential impact, showcasing OpenAI's further breakthroughs in the field of large language models.

AiBase Summary:
🚀 GPT-5 is described as OpenAI's most advanced large language model, with strong reasoning capabilities and code quality.
🧩 GPT-5 will launch multiple versions to meet the needs of different users and scenarios.
🌐 The authenticity of the leaked information has attracted widespread attention, and developers are looking forward to official confirmation of GPT-5's technical details.

12. FlowSpeech: The world's first written-to-speech TTS

FlowSpeech is an innovative AI text-to-speech tool that can convert written text into natural and fluent spoken expressions. It solves the shortcomings of traditional TTS tools in tone variation and emotional expression through context-awareness and multimodal support technologies, providing users with a more realistic conversational speech experience.

AiBase Summary:
🌍 FlowSpeech focuses on converting written language into spoken language, enhancing the naturalness of speech synthesis.
💡 Intelligent content filtering function automatically identifies and trims unsuitable content for reading, improving speech quality.
🚀 The development team plans to launch personalized voice customization services, expanding the application boundaries.
Details link: https://listenhub.ai/zh?tab=flowspeech

MiniMax Large Model Mispronounces Names - Xiyu Technology: Insufficient Training After Specific Tokens

A technical report from XiYu Technology reveals that the M2 series model fails to accurately output specific names like 'Ma Jiaqi' due to a 'token offset' issue caused by the tokenizer. The model splits the name into 'Ma' and 'Jiaqi', compressing the vector space and causing recognition bias. This exposes a common yet subtle flaw in current large model training, affecting precise generation of specific names.....

Cloudflare Lays Off 1,100 Employees Due to AI Efficiency Improvements, Revenue Reaches Record High

Cloudflare announces layoffs of approximately 20% (1,100 employees) despite revenue growth. CEO Matthew Prince cites AI-driven efficiency gains, not performance issues, enabling leaner operations. Similar to Meta and Microsoft, the company optimizes workforce during high growth. Cloudflare's cybersecurity services cover millions of websites globally.....

New Breakthrough in Domestic Computing Power! Hai Guang DCU and Tencent Hunyuan Hy3 Large Model Complete Deep Compatibility

During a critical period of synergy between domestic large models and computing power infrastructure, Haiguang Information announced that its Deep Computing 3 DCU has completed deep adaptation with Tencent's Hunyuan Hy3preview large model, marking a significant breakthrough in domestic high-performance computing supporting ultra-large-scale complex logical reasoning models.....

Breaking Tradition! The Small-Scale Inference Engine DeepSeek V4 Flash is Released

DeepSeek V4Flash is a compact inference engine optimized for the Metal platform, delivering efficient and flexible local inference by tailoring execution for DeepSeek V4Flash models. Its advantages include speed enhancements and a unique thinking mode design, distinguishing it from general engines to maximize performance.....

AliQwen AI Glasses S1 Upgrade: Proactive Service and the World's First 3D Display Feature Arrive

Alibaba's Qwen AI glasses S1 major upgrade introduces proactive service capabilities, offering personalized reminders based on weather, time, and schedule, such as prompting to bring an umbrella when going out. It also adds life service functions like ride-hailing, flash purchase, and trip planning to enhance travel convenience.....

Apple's First AI Hardware Exposed: Camera-Integrated AirPods Have Entered the DVT Stage

Apple is accelerating its AI hardware strategy, with camera-integrated AirPods entering the design verification testing phase, marking that the product design is basically finalized. The camera is not for taking photos, but rather serves as "Siri's eyes," aiming to achieve environmental perception and intelligent interaction, and is expected to become Apple's first true AI wearable device.

Tencent Hunyuan Hy3 Preview Version Launches, Token Usage Surges More Than Tenfold in Two Weeks

Since the launch of the Hy3 preview, Tencent Hunyuan's token call volume has surged to ten times that of its predecessor Hy2, driven by code and agent scenarios. Key apps like WorkBuddy, Codebuddy, and Qclaw saw over 16.5x growth. In the past week, token calls reached 3.66 trillion, securing first place in both weekly rankings and market share, with standout performance in programming and agent applications.....

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

GEO Brand Visibility

AI Visibility Audit

AI Search Visibility Checker

AI Conversation Insight

GEO Promotion Link Detection

GEO Ranking Optimization System

GEO Ranking Optimization

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

LLM API Hub

AI Models Finder

Model Providers

LLM Leaderboard

Compare LLMs

LLM Cost Calculator

LLM Arena

AI Model Compatibility Checker

AI Deployment Calculator

AI Daily: Alibaba Launches New Qwen3-4B Model; Xiaohongshu Releases Open-Source Model Dots.vlm1; MiniMax Speech 2.5 Voice Generation Model Goes Live

站长之家

This article is from AIbase Daily

AI News Recommendations

The Trial Scene of Musk's Case Against OpenAI: What Secrets Did a 2017 CEO Diary Reveal?

Survey Shows Players Are反感 AI Intervention in Game Scenes, Retaining Original Style Becomes the Main Choice

MiniMax Large Model Mispronounces Names - Xiyu Technology: Insufficient Training After Specific Tokens

Cloudflare Lays Off 1,100 Employees Due to AI Efficiency Improvements, Revenue Reaches Record High

New Breakthrough in Domestic Computing Power! Hai Guang DCU and Tencent Hunyuan Hy3 Large Model Complete Deep Compatibility

Breaking Tradition! The Small-Scale Inference Engine DeepSeek V4 Flash is Released

AliQwen AI Glasses S1 Upgrade: Proactive Service and the World's First 3D Display Feature Arrive

Apple's First AI Hardware Exposed: Camera-Integrated AirPods Have Entered the DVT Stage

Comprehensive Ban! Claude Desktop Tightens Restrictions, Third-Party Models Like DeepSeek V4 Can No Longer Be Directly Integrated

Tencent Hunyuan Hy3 Preview Version Launches, Token Usage Surges More Than Tenfold in Two Weeks

AI News Recommendations

The Trial Scene of Musk's Case Against OpenAI: What Secrets Did a 2017 CEO Diary Reveal?

Survey Shows Players Are反感 AI Intervention in Game Scenes, Retaining Original Style Becomes the Main Choice

MiniMax Large Model Mispronounces Names - Xiyu Technology: Insufficient Training After Specific Tokens

Cloudflare Lays Off 1,100 Employees Due to AI Efficiency Improvements, Revenue Reaches Record High

New Breakthrough in Domestic Computing Power! Hai Guang DCU and Tencent Hunyuan Hy3 Large Model Complete Deep Compatibility

Breaking Tradition! The Small-Scale Inference Engine DeepSeek V4 Flash is Released

AliQwen AI Glasses S1 Upgrade: Proactive Service and the World's First 3D Display Feature Arrive

Apple's First AI Hardware Exposed: Camera-Integrated AirPods Have Entered the DVT Stage

Comprehensive Ban! Claude Desktop Tightens Restrictions, Third-Party Models Like DeepSeek V4 Can No Longer Be Directly Integrated

Tencent Hunyuan Hy3 Preview Version Launches, Token Usage Surges More Than Tenfold in Two Weeks