AI Daily: Wan 2.2-S2V Model to be Released; ByteDance Tests 3D Model Generator; Microsoft Opensources VibeVoice-1.5B Model

站长之家

Published inAI News · 13 min read · Aug 26, 2025

Welcome to the "AI Daily" column! This is your guide to exploring the world of artificial intelligence every day. Each day, we bring you the latest updates in the AI field, focusing on developers to help you understand technology trends and innovative AI product applications.

New AI products click to learn more: https://app.aibase.com/zh

1. Alibaba Tongyi Wanxiang announces Wan 2.2-S2V model: Unlocking AI video and audio synchronization generation

The Tongyi Wanxiang team at Alibaba released its latest AI model, Wan 2.2-S2V, on the social media platform X. This model can generate videos and audio synchronously, achieving a deep integration of video and audio. This marks an important advancement in multimodal AI generation technology, providing content creators with more efficient and expressive tools.

【AiBase Summary:】
🔥 The Wan2.2-S2V model has the ability to generate videos and audio synchronously, breaking through the limitations of traditional video generation models.
🎵 The model can generate AI videos that include singing audio, showcasing the innovativeness of multimodal AI generation technology.
🚀 This model may redefine the standards in the AI video generation field, promoting the development of immersive and realistic content creation.

2. ByteDance tests a new 3D model generation tool "3D Model Generator"

The Dou Bao team under ByteDance is developing a new 3D model generation tool called "3D Model Generator," aimed at providing users with controllable large-scale generation model functions. The tool supports image-based generation and combined image and model file generation, lowering the barrier to 3D modeling, especially significant in the game development field.

【AiBase Summary:】
🖼️ Supports generating 3D models based on images, lowering the barrier to 3D modeling.
⚙️ Provides a generation method combining images and model files, enhancing creative flexibility.
🚀 Expected to be opened to the public, expanding Dou Bao's functionality to serve a broader range of user needs.

3. Mobile devices can run it! Mianbi Intelligence launches MiniCPM-V4.5: 410 million parameters outperform GPT-4.1-mini

Mianbi Intelligence, in collaboration with the NLP Lab at Tsinghua University, launched MiniCPM-V4.5, a multi-modal large model for edge devices. It performs excellently and deploys efficiently. The model shows outstanding performance in multiple benchmark tests, supporting multilingual, video, and high-resolution image processing, suitable for edge devices, promoting the popularization of AI technology.

【AiBase Summary:】
🌟 MiniCPM-V4.5 achieves high performance with 410 million parameters, surpassing models like GPT-4.1-mini.
🖼️ Supports multi-image, video understanding, and high-resolution image processing, with OCR performance leading mainstream models.
📱 Efficiently deployed on edge devices, suitable for mobile and offline scenarios, reducing the development threshold.
Details link: https://huggingface.co/openbmb/MiniCPM-V-4_5

4. Apple introduces a new AI training method: replacing manual ratings with task lists significantly improves model performance

Apple's research team proposed an innovative training method called Reinforcement Learning with Checklist Feedback (RLCF), which replaces traditional manual likes with specific task lists, significantly improving the ability of large language models to handle complex instructions. This method performs well in multiple evaluation benchmarks, especially showing significant results in handling complex multi-step tasks.

【AiBase Summary:】
🍎 RLCF method replaces manual ratings with task lists, improving the model's ability to execute complex instructions.
📊 Performance improvements are significant in FollowBench, InFoBench, and other tests, up to 8.2%.
⚙️ Uses large-scale models to generate checklists, providing optimization guidance for small models, but requires powerful computing resources.

5. Microsoft open-sources VibeVoice-1.5B model: Breakthrough in 90-minute long speech synthesis

Microsoft open-sourced its latest audio model, VibeVoice-1.5B, which achieved several major breakthroughs in speech synthesis technology, including support for 90-minute long speech synthesis, four speaker support, and a 3200x audio compression rate. Its dual tokenizer architecture effectively solves the problem of mismatch between voice and semantics, bringing new technical breakthroughs to the field of speech synthesis.

【AiBase Summary:】
🔊 VibeVoice-1.5B model can synthesize 90 minutes of long speech at once, supporting up to four speakers.
💾 The model achieves a 3200x audio compression rate while maintaining high-fidelity speech quality.
🤖 It uses a dual tokenizer architecture to solve the problem of mismatch between voice and semantics.
Details link: https://huggingface.co/microsoft/VibeVoice-1.5B

6. Google Imagen 4 officially launched Gemini API and Google AI Studio

Google released its new text-to-image generation model Imagen4, which is available to users through the Gemini API and Google AI Studio platform. The model includes three versions optimized for different needs, improving image generation quality, speed, and cost-effectiveness, providing powerful tools for industries such as art creation and advertising design.

【AiBase Summary:】
🌟 The standard version of Imagen4 improves overall image generation quality, especially excelling in text rendering accuracy.
⚡ The Imagen4Fast version optimizes fast image generation and large-scale processing tasks, significantly increasing processing speed, with a cost of $0.02 per generation.
🖼️ The Imagen4Ultra version can generate more detailed image details and follow user input text prompts more accurately, ensuring consistency and accuracy of the generated results.

7. Key AI talent leaves ByteDance: Visual Research Director Feng Jia Shi officially resigns

Feng Jia Shi, as the core leader of the Seed large model visual foundation research team at ByteDance, his resignation has had some impact on the company's AI research layout. He has a deep academic background and rich experience in computer vision and has made significant achievements after joining ByteDance.

【AiBase Summary:】
🔥 Feng Jia Shi is the leader of the Seed large model visual foundation research team at ByteDance, and his resignation has drawn widespread attention.
💡 Feng Jia Shi has an educational background from the University of Science and Technology of China, the Institute of Automation, Chinese Academy of Sciences, and the National University of Singapore, with a deep academic background.
🚀 During his time at ByteDance, Feng Jia Shi led research on multimodal base models and generative models, making important contributions to the company's technological innovation.

8. NVIDIA launches Jetson Thor robot computing platform

NVIDIA launched a new Jetson Thor robot computing platform, using the Blackwell GPU architecture, with AI computing power reaching 2070 TFLOPS, a 7.5 times improvement over the previous generation. The platform is equipped with 128GB of memory, supports running multiple AI models, and integrates the NVIDIA Isaac simulation platform, providing developers with a unified development environment.

【AiBase Summary:】
🚀 Jetson Thor uses the Blackwell GPU architecture, with AI computing power reaching 2070 TFLOPS, showing significant performance improvements.
🧠 Equipped with 128GB of large memory, it supports multitasking and efficient operation in complex scenarios.
🌐 Integrates the NVIDIA Isaac simulation platform, providing a unified development environment from cloud to edge.

9. Genspark launches AIDesigner: One-click generation of brand solutions, redefining the new landscape of AI design

Genspark AI Designer is a revolutionary AI design tool that can generate complete brand design solutions with one click, covering areas such as logos, packaging, and website design, greatly lowering the design barrier, and attracting widespread attention from the global design and tech communities.

【AiBase Summary:】
🎨 Genspark AI Designer supports multimodal input, capable of generating vector icons, 3D renderings, and animated videos as design assets.
🌐 The tool completes complex design tasks through natural language instructions, achieving full-chain creative solutions for brand logos, packaging, and websites.
💡 AI Designer redefines the brand design process, offering efficient and cost-effective solutions for creators and enterprises.
Details link: https://www.genspark.ai/ai_designer

10. Dou Bao officially launches the minor protection mode

Dou Bao launched the minor protection mode to help parents manage their children's usage behavior. This mode disables certain features, such as recommended videos and third-party web browsing, but retains translation and in-depth research functions.

【AiBase Summary:】
🔒 The minor protection mode can be activated by parents through a password, restricting access to certain content.
📺 Recommended videos and third-party web browsing functions are default closed in this mode.
🌐 Translation and in-depth research functions remain usable, ensuring that learning and exploration are not affected.

China's First 30,000-Card AI Cluster Officially Launched, Big Models with Trillion Parameters No Longer Lack Computing Power

Suzhou Zhongke launched the country's first 30,000-card scaleX super cluster at the Zhengzhou core node, establishing the largest domestic AI computing pool. It took less than two months to go from ten thousand cards to thirty thousand cards, marking a major breakthrough in domestic computing power.

Musk's Prediction: Space Will Become a Cost Advantage for AI Deployment Within 36 Months, Power Shortage May Lead to Chip Accumulation

Musk predicted in a podcast that due to the stagnation of Earth's power growth, space will become the cheapest and most efficient place to deploy AI in the next three years. He pointed out that the world is facing a power bottleneck, with chip production growing exponentially while power growth remains almost flat. He predicts that by the end of 2026, humans may face a power shortage, driving "Space GPU" to become a focus of the capital market.

AI Daily: Kolors 3.0 Released; Alibaba's Large Model Brand Officially Renamed Qwen; Mistral AI Releases Voxtral Transcribe 2 Voice Model

Welcome to the [AI Daily] column! Here is your guide to exploring the world of artificial intelligence every day. Each day, we present you with the latest content in the AI field, focusing on developers, helping you grasp technological trends and understand innovative AI product applications. Click to learn more about new AI products: https://app.aibase.com/zh1. 'Global First Subject Reference': Kolors AI 3.0 is officially released, opening the era of AI directing with 15-second long videos. The release of Kolors AI 3.0 marks a new era in AI video creation, through

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Brand Visibility

AI Brand Monitoring Tool

AI Search Visibility Checker

GEO Promotion Link Detection

GEO Ranking Optimization System

GEO Services​

AI Model Compatibility Checker

AI Deployment Calculator

AI Daily: Wan 2.2-S2V Model to be Released; ByteDance Tests 3D Model Generator; Microsoft Opensources VibeVoice-1.5B Model

站长之家

This article is from AIbase Daily

AI News Recommendations

China's First 30,000-Card AI Cluster Officially Launched, Big Models with Trillion Parameters No Longer Lack Computing Power

Musk's Prediction: Space Will Become a Cost Advantage for AI Deployment Within 36 Months, Power Shortage May Lead to Chip Accumulation

The Rise of an AI Voice Giant! ElevenLabs Secures $5 Billion in Funding, Valuation Surges to $11 Billion, Becoming the World's Most Expensive AI Voice Service Provider

Anthropic Super Bowl Ad Mocks OpenAI's Ads

Tencent Games Launches the 2026 Winter Holiday Protection Action for Minors, AI Features Assist Families in Scientific Management

Musk Makes Another Bold Statement: Tesla Humanoid Robot Can Establish Civilization on Habitable Planets Independently

AI Daily: Kolors 3.0 Released; Alibaba's Large Model Brand Officially Renamed Qwen; Mistral AI Releases Voxtral Transcribe 2 Voice Model

Google's Earnings Shock! Annual Revenue Exceeds 400 Billion, AI Surges with Gemini Users Catching Up to ChatGPT

Latency below 0.2 seconds! Mistral AI releases Voxtral Transcribe 2 speech model with support for real-time Chinese transcription

China's Generative AI User Base Exceeds 600 Million: Penetration Rate Surpasses 40 Percent, Computing Power Levels Rise to Global Front Rank

AI News Recommendations

China's First 30,000-Card AI Cluster Officially Launched, Big Models with Trillion Parameters No Longer Lack Computing Power

Musk's Prediction: Space Will Become a Cost Advantage for AI Deployment Within 36 Months, Power Shortage May Lead to Chip Accumulation

The Rise of an AI Voice Giant! ElevenLabs Secures $5 Billion in Funding, Valuation Surges to $11 Billion, Becoming the World's Most Expensive AI Voice Service Provider

Anthropic Super Bowl Ad Mocks OpenAI's Ads

Tencent Games Launches the 2026 Winter Holiday Protection Action for Minors, AI Features Assist Families in Scientific Management

Musk Makes Another Bold Statement: Tesla Humanoid Robot Can Establish Civilization on Habitable Planets Independently

AI Daily: Kolors 3.0 Released; Alibaba's Large Model Brand Officially Renamed Qwen; Mistral AI Releases Voxtral Transcribe 2 Voice Model

Google's Earnings Shock! Annual Revenue Exceeds 400 Billion, AI Surges with Gemini Users Catching Up to ChatGPT

Latency below 0.2 seconds! Mistral AI releases Voxtral Transcribe 2 speech model with support for real-time Chinese transcription

China's Generative AI User Base Exceeds 600 Million: Penetration Rate Surpasses 40 Percent, Computing Power Levels Rise to Global Front Rank

GEO Services