Welcome to the 【AI Daily】 column! This is your guide to exploring the world of artificial intelligence every day. Here we present the latest developments in the AI field, focusing on developers and helping you gain insight into technical trends and understand innovative AI product applications.
Fresh AI products click to learn more: https://top.aibase.com/
1. Baidu PaddleOCR 3.0 open-source release: OCR accuracy leaps by 13%
The Baidu Paddle team has released version 3.0 of PaddleOCR, enhancing text recognition accuracy, multi-language support, handwriting recognition, and document parsing capabilities. It also adds support for domestic hardware and introduces core functions such as PP-OCRv5, PP-StructureV3, and PP-ChatOCRv4.
【AiBase Summary:】
🚀 The full-scenario text recognition model PP-OCRv5 supports five types of text recognition with an overall accuracy increase of 13%, enabling seamless deployment.
📚 The document parsing solution PP-StructureV3 strengthens page detection and table recognition capabilities, performing excellently in high-precision parsing across multiple scenarios.
🤖 The intelligent document understanding solution PP-ChatOCRv4 combines with the Wenxin large model, improving key information extraction accuracy by 15% and supporting complex document processing.
Details link: https://github.com/PaddlePaddle/PaddleOCR
2. Kunlun Wonder Super Intelligence Agent Released! AI Office Revolution Comes, Deep Research Outperforms OpenAI, Cost Only 40%!
The Kunlun Wonder super-intelligent agent is an AI Office smart agent based on self-developed Deep Research technology. Its strong multimodal content generation capability and cost advantage of only 40% of OpenAI have sparked global AI community discussions.
【AiBase Summary:】
✨ The Kunlun Wonder super-intelligent agent adopts a multi-agent architecture, including five expert agents and one general agent, supporting the generation of various office contents in one stop.
🚀 Its core technology Deep Research model has low cost and high efficiency, surpassing OpenAI Deep Research with 82.42 points in the GAIA benchmark test.
🌐 The open-source framework and low-cost deployment strategy make Kunlun an ideal choice for small and medium-sized enterprises and individual developers.
Details link: https://mcp.so/server/skywork-super-agents/Skywork-ai
3. OpenAI Core APIs Support MCP, Simplifying Intelligent Agent Development Process
OpenAI's Responses API has added MCP support, significantly reducing the difficulty of integrating AI models with external tools. It also launched several feature upgrades, including image generation, code interpreter, and optimized file search capabilities.
【AiBase Summary:】
✨ OpenAI Responses API supports MCP protocol, allowing developers to connect external tools with minimal code.
🌟 New features include image generation, code interpreter, and optimized file search capabilities, improving development efficiency.
🌐 MCP has become a de facto standard for AI intelligent agent development, promoting cross-platform collaboration and flexibility.
4. xAI Launches Web Search API: Live Search, Empowering AI to Obtain Content in Real Time
xAI officially launched the Live Search API. This function allows developers to use the Grok model to real-time search information from various data sources, greatly enhancing the dynamic information processing capability of AI applications. This API is currently in free public testing, providing developers with powerful tools to simplify search logic and data integration.
【AiBase Summary:】
🌟 Live Search API supports autonomous search decisions; Grok can automatically determine whether to search based on the dialogue context without human intervention.
🌐 Provides diverse data sources, including X platform, web pages, news, and RSS feeds, ensuring comprehensive and real-time updated information.
🔧 Highly flexible and efficient integration, supporting multiple SDKs, allowing developers to easily adjust base URLs and API keys for quick access.
Details link: https://docs.x.ai/docs/guides/live-search
5. Google Sparkify Experimental Product Goes Online: Turn Questions into Animated Shorts in Seconds, Understand Complex Knowledge Instantly
Google's Sparkify uses Gemini and Veo models to convert complex knowledge points into intuitive animated short videos, suitable for education, science popularization, and content creation fields.
【AiBase Summary:】
✨ Complex knowledge points are presented intuitively through animated short videos, improving understanding efficiency.
🎥 High-quality animated video production using Gemini2.5 and Veo2 models.
🌍 Supports multi-language expansion, covering more regions and populations in the future.
Details link: https://sparkify.withgoogle.com/explore
6. Mistral Returns to the Open Source Community: Releases Devstral, an Extremely Efficient Code AI Model
Mistral AI has released the brand-new open-source language model Devstral, which is lightweight and designed specifically for proxy AI software development. It boasts excellent performance and supports local operation, showcasing the power of open-source community cooperation.
【AiBase Summary:】
Devstral has 24 million parameters and is released under the Apache2.0 license, allowing free deployment and commercialization.
Performance is outstanding, surpassing most closed-source models in SWE-Bench verification, suitable for local and private deployment scenarios.
As the latest progress in the Codestral series, Devstral supports cross-file context understanding, suitable for complex software development tasks.
7. Video Ocean Releases 2K/4K HDR Video Generation Tool, Price Breakthrough Sparks Craze Across the Web
On May 21st, Lucheng Technology launched the new AI video generation tool Video Ocean, supporting rapid generation of high-quality videos, providing various effects and functions, at an affordable price that is completely free, sparking a wave of creativity.
【AiBase Summary:】
✨ Supports generating 2K/4K HDR high-quality videos within 5-10 seconds, suitable for various scene creations.
🎥 Provides a massive number of templates and effects, such as Laugh, Cakeify, etc., making it easy for beginners to create professional-level videos.
💰 Prices are only 1/10 of those of Keling 2.0, completely free, attracting positive feedback from various user groups.
8. Google Launches New Tool SynthID Detector, Assisting in Identifying AI-Generated Content
Google has launched a new tool called SynthID Detector, aimed at helping users detect whether the content was generated by its AI tools. This tool can identify AI-generated content and highlight parts marked with SynthID watermarks, currently being offered to early testers.
【AiBase Summary:】
🌟 SynthID Detector is a new tool used to identify AI-generated content, supporting images, text, audio, and video.
🔍 This tool can automatically scan uploaded content, searching for and highlighting SynthID watermarks.
🚀 Currently available only to early testers, it will gradually be rolled out to more users in the future.
Details link: https://blog.google/technology/ai/google-synthid-ai-content-detector/
9. Rapid Rise of Google's AI Note-taking Tool NotebookLM
In the past six months, Google's AI-assisted knowledge management tool NotebookLM has seen a 56% monthly visit growth rate. It has gained significant attention due to its innovative features like 'audio overview', multi-language support, and diverse application scenarios.
【AiBase Summary:】
🚀 NotebookLM's monthly visit volume increased by 56%, becoming a dark horse in the AI application field.
🌐 Supports 50+ languages for podcast content generation, breaking language barriers and enhancing user experience.
📚 Suitable for students, researchers, and content creators, efficiently used in both academic and entertainment fields.
10. SiliconFlow Upgrades DeepSeek-R1 and Other Inference Model APIs, Supporting 128K Context Length
SiliconFlow has upgraded its inference model APIs, significantly increasing the maximum context length to 128K, enhancing the model's reasoning ability and output quality. It also introduced independent control over thought chains and reply content length, allowing developers to adjust model performance more flexibly.
【AiBase Summary:】
🚀 Supports a maximum context length of 128K, significantly enhancing the depth of thinking and output completeness of the model.
🔍 Introduces independent control over thought chains and reply content length, strengthening developers' precise control over model behavior.
⚠️ When the length limit is reached, the model output will be truncated and the reason marked, ensuring transparency of use.
Details link: https://docs.siliconflow.cn/cn/userguide/capabilities/reasoning
11. Google DeepMind Releases New AI Music Generation Model Lyria2, Supporting Real-Time Creation
Lyria2 is the latest music generation model released by Google DeepMind. It features high-fidelity sound quality, real-time interaction capabilities, and multi-style adaptability, bringing revolutionary changes to music creation.
【AiBase Summary:】
🎶 High-fidelity sound quality: Can generate 48kHz stereo audio, precisely capturing musical details, suitable for professional music production and commercial projects.
⚡ Real-time interaction: Lyria RealTime feature allows users to instantly adjust music styles, rhythms, etc., inspiring creative inspiration.
🌐 Multi-modal support: Integrated into the Music AI Sandbox toolkit, supporting text, sheet music, or audio fragment input, covering various music styles.
Details link: https://deepmind.google/models/lyria/
12. Multimodal Large Model MMaDA: Let AI Learn "Interdimensional Thinking," a Versatile All-Rounder Has Arrived!
I just read about MMaDA. This multimodal large model jointly developed by several top universities and enterprises has achieved seamless switching and deep reasoning between text, images, and other modalities thanks to its unique unified diffusion architecture, hybrid long-chain thinking fine-tuning, and unified reinforcement learning algorithm, outperforming existing models like GPT-4.
【AiBase Summary:】
🌟 Unified Diffusion Architecture: Breaks through the barriers of traditional multimodal models, achieving seamless handling of text, image, and other data types.
📚 Hybrid Long-Chain Thinking Fine-Tuning: Through cross-modal reasoning alignment, enables AI to have deep thinking capabilities.
🏆 Unified Reinforcement Learning Algorithm UniGRPO: Balances reasoning and generation tasks, comprehensively improving AI performance.
Details link: https://github.com/Gen-Verse/MMaDA
13. Microsoft Releases Web Intelligent Agent Magentic-UI, Specifically Designed to Solve Complex Web Tasks
I really appreciate the design philosophy of Magentic-UI. It puts people first, emphasizing transparency and controllability, making me feel secure when using AI assistants. This tool not only enhances work efficiency but also provides developers with a powerful open-source platform.
【AiBase Summary:】
🌐 Magentic-UI is an AI intelligent agent research prototype centered on humans, assisting users in completing complex tasks in real time through web browsers.
🔄 It introduces collaborative planning and behavioral protection features, ensuring that users maintain control during automation while ensuring safety and flexibility.
💡 Works through multi-agent collaboration, supporting plan learning, optimizing the automation efficiency of future tasks from historical ones.
Details link: https://github.com/microsoft/Magentic-UI
14. Framer Releases New AI Features, Wireframer Builds Websites in Seconds, Workshop Generates Interactive Components, Vectors 2.0 and A/B Testing Ignite Design Craze!
Framer launched a suite of new AI features at I/O2025, including Wireframer, Workshop, Advanced Analytics, and Vectors2.0. These features, driven by AI, significantly reduce the cost and complexity of website creation through web layout generation, interactive component design, vector drawing upgrades, and advanced analytics tools.
【AiBase Summary:】