Welcome to the "AI Daily" column! This is your guide to exploring the world of artificial intelligence every day. Each day, we bring you the latest updates in the AI field, focusing on developers to help you understand technology trends and innovative AI product applications.
New AI products click to learn more: https://app.aibase.com/zh
1. Alibaba Tongyi Wanxiang announces Wan 2.2-S2V model: Unlocking AI video and audio synchronization generation
The Tongyi Wanxiang team at Alibaba released its latest AI model, Wan 2.2-S2V, on the social media platform X. This model can generate videos and audio synchronously, achieving a deep integration of video and audio. This marks an important advancement in multimodal AI generation technology, providing content creators with more efficient and expressive tools.
【AiBase Summary:】
🔥 The Wan2.2-S2V model has the ability to generate videos and audio synchronously, breaking through the limitations of traditional video generation models.
🎵 The model can generate AI videos that include singing audio, showcasing the innovativeness of multimodal AI generation technology.
🚀 This model may redefine the standards in the AI video generation field, promoting the development of immersive and realistic content creation.
2. ByteDance tests a new 3D model generation tool "3D Model Generator"
The Dou Bao team under ByteDance is developing a new 3D model generation tool called "3D Model Generator," aimed at providing users with controllable large-scale generation model functions. The tool supports image-based generation and combined image and model file generation, lowering the barrier to 3D modeling, especially significant in the game development field.
【AiBase Summary:】
🖼️ Supports generating 3D models based on images, lowering the barrier to 3D modeling.
⚙️ Provides a generation method combining images and model files, enhancing creative flexibility.
🚀 Expected to be opened to the public, expanding Dou Bao's functionality to serve a broader range of user needs.
3. Mobile devices can run it! Mianbi Intelligence launches MiniCPM-V4.5: 410 million parameters outperform GPT-4.1-mini
Mianbi Intelligence, in collaboration with the NLP Lab at Tsinghua University, launched MiniCPM-V4.5, a multi-modal large model for edge devices. It performs excellently and deploys efficiently. The model shows outstanding performance in multiple benchmark tests, supporting multilingual, video, and high-resolution image processing, suitable for edge devices, promoting the popularization of AI technology.
【AiBase Summary:】
🌟 MiniCPM-V4.5 achieves high performance with 410 million parameters, surpassing models like GPT-4.1-mini.
🖼️ Supports multi-image, video understanding, and high-resolution image processing, with OCR performance leading mainstream models.
📱 Efficiently deployed on edge devices, suitable for mobile and offline scenarios, reducing the development threshold.
Details link: https://huggingface.co/openbmb/MiniCPM-V-4_5
4. Apple introduces a new AI training method: replacing manual ratings with task lists significantly improves model performance
Apple's research team proposed an innovative training method called Reinforcement Learning with Checklist Feedback (RLCF), which replaces traditional manual likes with specific task lists, significantly improving the ability of large language models to handle complex instructions. This method performs well in multiple evaluation benchmarks, especially showing significant results in handling complex multi-step tasks.
【AiBase Summary:】
🍎 RLCF method replaces manual ratings with task lists, improving the model's ability to execute complex instructions.
📊 Performance improvements are significant in FollowBench, InFoBench, and other tests, up to 8.2%.
⚙️ Uses large-scale models to generate checklists, providing optimization guidance for small models, but requires powerful computing resources.
5. Microsoft open-sources VibeVoice-1.5B model: Breakthrough in 90-minute long speech synthesis
Microsoft open-sourced its latest audio model, VibeVoice-1.5B, which achieved several major breakthroughs in speech synthesis technology, including support for 90-minute long speech synthesis, four speaker support, and a 3200x audio compression rate. Its dual tokenizer architecture effectively solves the problem of mismatch between voice and semantics, bringing new technical breakthroughs to the field of speech synthesis.
【AiBase Summary:】
🔊 VibeVoice-1.5B model can synthesize 90 minutes of long speech at once, supporting up to four speakers.
💾 The model achieves a 3200x audio compression rate while maintaining high-fidelity speech quality.
🤖 It uses a dual tokenizer architecture to solve the problem of mismatch between voice and semantics.
Details link: https://huggingface.co/microsoft/VibeVoice-1.5B
6. Google Imagen 4 officially launched Gemini API and Google AI Studio
Google released its new text-to-image generation model Imagen4, which is available to users through the Gemini API and Google AI Studio platform. The model includes three versions optimized for different needs, improving image generation quality, speed, and cost-effectiveness, providing powerful tools for industries such as art creation and advertising design.
【AiBase Summary:】
🌟 The standard version of Imagen4 improves overall image generation quality, especially excelling in text rendering accuracy.
⚡ The Imagen4Fast version optimizes fast image generation and large-scale processing tasks, significantly increasing processing speed, with a cost of $0.02 per generation.
🖼️ The Imagen4Ultra version can generate more detailed image details and follow user input text prompts more accurately, ensuring consistency and accuracy of the generated results.
7. Key AI talent leaves ByteDance: Visual Research Director Feng Jia Shi officially resigns
Feng Jia Shi, as the core leader of the Seed large model visual foundation research team at ByteDance, his resignation has had some impact on the company's AI research layout. He has a deep academic background and rich experience in computer vision and has made significant achievements after joining ByteDance.
【AiBase Summary:】
🔥 Feng Jia Shi is the leader of the Seed large model visual foundation research team at ByteDance, and his resignation has drawn widespread attention.
💡 Feng Jia Shi has an educational background from the University of Science and Technology of China, the Institute of Automation, Chinese Academy of Sciences, and the National University of Singapore, with a deep academic background.
🚀 During his time at ByteDance, Feng Jia Shi led research on multimodal base models and generative models, making important contributions to the company's technological innovation.
8. NVIDIA launches Jetson Thor robot computing platform
NVIDIA launched a new Jetson Thor robot computing platform, using the Blackwell GPU architecture, with AI computing power reaching 2070 TFLOPS, a 7.5 times improvement over the previous generation. The platform is equipped with 128GB of memory, supports running multiple AI models, and integrates the NVIDIA Isaac simulation platform, providing developers with a unified development environment.
【AiBase Summary:】
🚀 Jetson Thor uses the Blackwell GPU architecture, with AI computing power reaching 2070 TFLOPS, showing significant performance improvements.
🧠 Equipped with 128GB of large memory, it supports multitasking and efficient operation in complex scenarios.
🌐 Integrates the NVIDIA Isaac simulation platform, providing a unified development environment from cloud to edge.
9. Genspark launches AIDesigner: One-click generation of brand solutions, redefining the new landscape of AI design
Genspark AI Designer is a revolutionary AI design tool that can generate complete brand design solutions with one click, covering areas such as logos, packaging, and website design, greatly lowering the design barrier, and attracting widespread attention from the global design and tech communities.
【AiBase Summary:】
🎨 Genspark AI Designer supports multimodal input, capable of generating vector icons, 3D renderings, and animated videos as design assets.
🌐 The tool completes complex design tasks through natural language instructions, achieving full-chain creative solutions for brand logos, packaging, and websites.
💡 AI Designer redefines the brand design process, offering efficient and cost-effective solutions for creators and enterprises.
Details link: https://www.genspark.ai/ai_designer
10. Dou Bao officially launches the minor protection mode
Dou Bao launched the minor protection mode to help parents manage their children's usage behavior. This mode disables certain features, such as recommended videos and third-party web browsing, but retains translation and in-depth research functions.
【AiBase Summary:】
🔒 The minor protection mode can be activated by parents through a password, restricting access to certain content.
📺 Recommended videos and third-party web browsing functions are default closed in this mode.
🌐 Translation and in-depth research functions remain usable, ensuring that learning and exploration are not affected.