Welcome to the "AI Daily" section! Here is your guide to exploring the world of artificial intelligence every day. Every day, we present you with the latest content in the AI field, focusing on developers to help you understand technology trends and learn about innovative AI product applications.

Hot AI products Click for more:https://top.aibase.com/

1. Alibaba launches new Qwen3-4B model: compact and powerful, runs AI on phones!

The Qwen3-4B series model launched by Alibaba's Tongyi Qianwen team has made important breakthroughs in the field of small language models, providing new technical paths for mobile AI applications. The model not only performs well but also has efficient resource utilization capabilities, meeting the needs of practical application scenarios.

image.png

AiBase Summary:

🧠 The Qwen3-4B series model achieves a balance between performance and size, suitable for running on mobile devices.

📊 Qwen3-4B-Instruct-2507 outperforms the closed-source small model GPT-4.1-nano and approaches the capabilities of the large-scale model Qwen3-30B-A3B.

🧮 Qwen3-4B-Thinking-2507 scores high in mathematical reasoning evaluations, demonstrating strong logical reasoning ability.

2. Xiaohongshu releases open-source multimodal large model dots.vlm1, leading the industry with NaViT visual encoder

Xiaohongshu Hi Lab released the open-source multimodal large model dots.vlm1, which is based on the NaViT visual encoder and DeepSeek V3 large language model, showing outstanding performance, especially in chart reasoning and STEM math reasoning, marking a new height for open-source multimodal models.

image.png

AiBase Summary:

🧠 Native self-developed NaViT visual encoder supports dynamic resolution, enhancing generalization ability.

📊 Built a large-scale, finely cleaned training set, improving image-text alignment quality.

🚀 Performs excellently in multimodal evaluations, approaching the closed-source models Gemini2.5Pro and Seed-VL1.5.

3. MiniMax Speech 2.5 voice generation model launched: stronger multilingual expressiveness

MiniMax launched the next-generation speech generation model Speech2.5, achieving significant improvements in multilingual expressiveness, voice replication, and language coverage. The model maintains the world's strongest level in Chinese while significantly improving the performance of English and other multilingual languages, bringing convenience and innovation opportunities to multiple industries.

image.png

AiBase Summary:

🧠 Speech2.5 achieved remarkable progress in multilingual expressiveness, supporting 40 languages.

🎙️ Voice replication reaches industry-leading precision, preserving regional accents.

🌐 Multilingual coverage expanded to 40 languages, including several new languages, aiding global content creation.

4. Midjourney introduces HD video mode, tailored for professionals to create high-quality videos

Midjourney introduced a new HD video mode, offering professional users higher resolution and better quality video generation tools. This mode significantly improves resolution and clarity, but the cost also increases accordingly. This feature further strengthens Midjourney's competitiveness in the AI video generation field.

image.png

AiBase Summary:

🎥 HD video mode provides higher pixel resolution, meeting the demand for high-quality images from professional users.

💰 HD mode costs approximately 3.2 times that of SD mode, but offers better visual effects.

🚀 Midjourney continuously optimizes its technology, competing fiercely with competitors such as OpenAI's Sora and Runway's Gen-4.

5. Cursor1.4 officially released: focused on asynchronous long-term tasks, accelerating automation of large codebases

The release of Cursor1.4 marks its further leadership in the field of AI-driven development tools. This version enhances asynchronous and long-term task processing capabilities, optimizes indexing and search functions for large codebases, and promotes the transition of AI coding tools toward full automation.

image.png

AiBase Summary:

🚀 Asynchronous task processing capabilities have been significantly improved, supporting background Agent operation and task queue management.

🔍 Precisely optimized for large codebases, improving code completion and query efficiency.

🔄 Promotes the transition of AI coding tools toward full automation, enhancing Agent autonomy and collaboration features.

Details link: https://cursor.com/en/changelog

6. Google denies AI search function affecting website traffic, but data shows a surge in zero-click searches

Google refuted claims that its AI search function has impacted website traffic, stating that natural click-through rates remain stable and the quality of clicks has improved. However, data shows a significant increase in the proportion of zero-click searches, indicating a shift in user behavior.

image.png

AiBase Summary:

🟢 Google claims that the AI search function has not significantly affected website traffic, but the proportion of zero-click searches has increased.

🟡 Google emphasizes that the quality of clicks has improved, but has not provided specific data to support its conclusion.

🔴 User trends are shifting to other platforms, such as Reddit and TikTok, causing changes in Google's traffic.

7. MiniCPM-V4.0 open-sourced, hailed as "GPT-4V on your phone"

MiniCPM-V4.0, a lightweight multimodal large model, demonstrates excellent performance and optimized design, excelling in tasks such as image, video understanding, and multi-turn dialogue. Its efficient operation on mobile devices opens up new possibilities for AI applications.

image.png

AiBase Summary:

🔥 MiniCPM-V4.0 is built on SigLIP2-400M and MiniCPM4-3B, with only 4.1B parameters, yet it demonstrates strong image and video understanding capabilities.

🚀 Tested on iPhone16Pro Max, the first response delay is less than 2 seconds, decoding speed exceeds 17 token/second, and it has high concurrency processing capability.

🌐 Provides rich ecosystem support, compatible with mainstream frameworks, and offers iOS apps and detailed tutorials, lowering the barrier for developers.

Details link: https://github.com/OpenBMB/MiniCPM-o

8. AMD and Qualcomm announce hardware support for gpt-oss series open models

AMD and Qualcomm jointly announced support for OpenAI's gpt-oss series models, marking an important advancement in the integration of edge computing and AI. Ryzen AI Max+395 processor becomes the first consumer-grade AI PC processor to run gpt-oss-120b, while Qualcomm Snapdragon platform demonstrates the excellent reasoning capabilities of gpt-oss-20b.

image.png

AiBase Summary:

🧠 AMD and Qualcomm announced support for OpenAI's gpt-oss series models, promoting the integration of edge computing and AI.

🚀 Ryzen AI Max+395 processor becomes the first consumer-grade AI PC processor in the world to run gpt-oss-120b.

📱 Qualcomm Snapdragon platform demonstrates the excellent reasoning capabilities of gpt-oss-20b, allowing developers to easily access the model.

9. Mianbi Intelligent's new multimodal model MiniCPM-V 4.0 open-sourced

Mianbi Intelligent's MiniCPM-V4.0 multimodal model has achieved significant improvements in parameter count and performance. It not only achieves state-of-the-art results in multiple evaluation benchmarks but also runs stably on mobile devices like smartphones. Its unique model structure design enables faster initial response time and lower VRAM usage, and it also opens source deployment tools to help developers achieve lightweight deployment.

image.png

AiBase Summary:

✨ MiniCPM-V4.0 achieves significant improvements in multimodal capabilities with 4B parameters, reaching state-of-the-art levels in its category.

📱 Runs stably and smoothly on mobile devices, suitable for local deployment and real-time tasks.

🚀 Optimized model structure brings faster initial response time and lower VRAM usage, improving overall performance.

Details link: https://github.com/OpenBMB/MiniCPM-o

10. Tencent opens sources WeKnora! Unlock complex document intelligent analysis, knowledge management enters the AI era

Tencent's open-sourced WeKnora is a document understanding and retrieval tool based on large language models, capable of processing multimodal documents and providing efficient structured content extraction and intelligent interaction functions. Its modular design and strong semantic processing capabilities bring technological innovations to multiple industries.

image.png

AiBase Summary:

🧠 WeKnora supports multimodal document parsing, extracting structured content from formats such as PDF, Word, and images.

💬 Based on large language models, it provides intelligent interaction functions, supporting multi-turn dialogues and natural language queries.

📦 Modular architecture design makes it easy to flexibly configure and expand, adapting to different industry needs.

Details link: https://github.com/Tencent/WeKnora

11. Big news! Detailed information about OpenAI's flagship model GPT-5 seems to have been leaked in advance on GitHub

The article reveals the performance leap of GPT-5, its multi-version layout, and its potential impact, showcasing OpenAI's further breakthroughs in the field of large language models.

image.png

AiBase Summary:

🚀 GPT-5 is described as OpenAI's most advanced large language model, with strong reasoning capabilities and code quality.

🧩 GPT-5 will launch multiple versions to meet the needs of different users and scenarios.

🌐 The authenticity of the leaked information has attracted widespread attention, and developers are looking forward to official confirmation of GPT-5's technical details.

12. FlowSpeech: The world's first written-to-speech TTS

FlowSpeech is an innovative AI text-to-speech tool that can convert written text into natural and fluent spoken expressions. It solves the shortcomings of traditional TTS tools in tone variation and emotional expression through context-awareness and multimodal support technologies, providing users with a more realistic conversational speech experience.

image.png

AiBase Summary:

🌍 FlowSpeech focuses on converting written language into spoken language, enhancing the naturalness of speech synthesis.

💡 Intelligent content filtering function automatically identifies and trims unsuitable content for reading, improving speech quality.

🚀 The development team plans to launch personalized voice customization services, expanding the application boundaries.

Details link: https://listenhub.ai/zh?tab=flowspeech