AI Daily: Baidu Launches Drawn-Imagine Platform and MuseSteamer; Alibaba's Audio-Driven Full-Body Digital Human Model OmniAvatar

Welcome to the "AI Daily" column! Here is your guide to exploring the world of artificial intelligence every day. Each day, we present you with the latest content in the AI field, focusing on developers to help you understand technology trends and innovative AI product applications.

Fresh AI products click to learn more:https://top.aibase.com/

1. Open-source end-to-end speech large model Step-Audio-AQAA: Understand audio and generate natural speech directly

Step-Audio-AQAA is an open-source end-to-end speech large model that can directly generate natural and fluent speech output from raw audio input, significantly improving the user experience of human-computer interaction. The model consists of three parts: a dual-codebook audio tokenizer, a backbone LLM, and a neural vocoder, which can efficiently process complex information in speech, laying a solid foundation for future intelligent speech applications.

【AiBase Summary:】
🔊 Step-Audio-AQAA can directly generate natural speech from audio input, enhancing the human-computer interaction experience.
📊 The model architecture consists of three modules: a dual-codebook audio tokenizer, a backbone LLM, and a neural vocoder, which can efficiently capture complex information in speech.
🎤 The release of Step-Audio-AQAA marks an important advancement in speech interaction technology, providing new ideas for future intelligent speech applications.
Details link: https://huggingface.co/stepfun-ai/Step-Audio-AQAA

2. Baidu launches "Huixiang" platform and MuseSteamer: Generate videos with AI, one image can create professional-level movies!

Baidu has launched the "Huixiang" platform and MuseSteamer, offering comprehensive video generation solutions through generative AI and multimodal technologies to meet the needs of search and advertising scenarios. MuseSteamer features strong controllability and high cost-effectiveness, allowing users to generate professional-level video content by simply uploading an image, greatly simplifying the video production process.

【AiBase Summary:】
🎥 MuseSteamer supports integrated audio and video generation, achieving cinematic production effects.
🔄 Supports continuous 10-second dynamic video generation, improving creative efficiency.
🖼️ Users only need to upload one image to generate professional-level video content.
Details link: https://huixiang.baidu.com/

3. Zhejiang University and Alibaba jointly launch OmniAvatar: A full-body digital human model driven by audio makes its debut

The OmniAvatar model, jointly launched by Zhejiang University and Alibaba, has made significant breakthroughs in audio-driven digital human technology, capable of generating natural and smooth full-body digital human videos, especially performing well in singing scenarios. The model supports fine control of generation details through text prompts and has potential for multi-scenario applications, bringing innovation possibilities to marketing, education, and entertainment fields.

【AiBase Summary:】
🎧 Audio-driven technology enables the generation of full-body digital human videos.
🎨 Supports text prompt control over details, enhancing flexibility.
🌐 Open-source project provides broad application space for commercial scenarios.

4. Baidu Search undergoes its largest update in ten years: AI Smart Box, Hundred Views, and AI Assistant are fully upgraded

Baidu Search has undergone its largest-scale update in ten years, introducing innovative functions such as the Smart Box, Hundred Views, and AI Assistant, significantly enhancing user search experience and creation capabilities.

【AiBase Summary:】
🧠 The Smart Box supports up to 1,000-character input, enhancing multimodal interaction capabilities.
🎥 Hundred Views function upgrades, supporting mixed content output and intelligent agent services.
📽️ AI Assistant adds video call functionality, enhancing creation and search capabilities.

5. xAI Console Adds Grok4 and Grok4Code References, Marking the Upcoming Launch of the Next Generation AI Model

xAI has added references to Grok4 and Grok4Code in its developer console, indicating that the launch of the next generation of artificial intelligence models is imminent. Grok4 is described as the pinnacle of all-around AI, while Grok4Code focuses on programming optimization. The inclusion of these two models indicates that their public release is entering the final preparation stage.

【AiBase Summary:】
🧠 Grok4, as xAI's flagship model, focuses on improving natural language processing, mathematical reasoning, and comprehensive reasoning capabilities.
💻 Grok4Code is dedicated to programming optimization and is planned to be seamlessly integrated with code editors to improve development efficiency.
🌐 xAI provides access to Grok4 through APIs and will expand to multimodal capabilities in the future, lowering the integration barriers for developers.

6. Gemini Live Receives a Major Upgrade! Seamless Integration with Google Applications, Smart Life Within Reach

Gemini Live's upgrade, through deep integration with the Google ecosystem, enhances users' smart interaction experience while also considering privacy protection, demonstrating its potential in the intelligent assistant field.

【AiBase Summary:】
📱 Gemini Live integrates deeply with Google Maps, Calendar, and other applications, improving cross-application operation efficiency.
🧠 Supports multimodal interaction, such as scanning information to automatically generate tasks or schedules, enhancing practicality.
🔒 Google emphasizes privacy protection, allowing users to manage permissions independently to ensure data security.

7. Gemini Live Will Be Fully Integrated With Google Applications, Making AI Assistants Smarter!

Gemini Live is undergoing a major upgrade, adding deep integration with various Google applications, including Google Maps, Google Calendar, and third-party applications such as Spotify and YouTube Music. Additionally, it introduces features based on camera input and smarter interaction methods, such as card interfaces and Circle-to-Search-like functions. Google also emphasizes privacy protection to ensure user data security.

【AiBase Summary:】
📲 Gemini Live now offers extended support for Google Maps, Google Calendar, and other applications, improving interaction efficiency.
🖼️ New features based on camera input allow users to identify concert posters or handwritten lists and automatically perform actions.
🔒 Google emphasizes privacy protection, allowing users to disable connections with applications and chat data training through settings.

8. Anthropic Annual Revenue Has Reached $4 Billion, Growing Nearly Fourfold Since the Beginning of the Year, Competition with Cursor Intensifies

The article states that the AI unicorn Anthropic has achieved annual revenue of $4 billion, nearly four times the amount at the beginning of the year. At the same time, its competitor Cursor is also actively expanding its business, intensifying the competition. Cursor relies on Anthropic's technology and enhances its competitiveness by introducing executives and innovations. The rapid development of AI technology has driven the demand for programming tools, and companies are competing for market share.

【AiBase Summary:】
🤖 Anthropic's annual revenue has reached $4 billion, growing nearly fourfold since the start of the year.
🔄 Cursor enhances market competitiveness by introducing executives from Anthropic.
📈 The rapid development of AI technology continues to increase demand for programming tools.