Welcome to the 【AI Daily】 column! Here is your guide to exploring the world of artificial intelligence every day. We present the latest content in the AI field for developers and help you gain insights into technological trends and innovative AI product applications.
Fresh AI products click to learn more: https://top.aibase.com/
1. DeepSeek R1-0528震撼发布: Free 128K context, performance rivals OpenAI o3!
The release of the DeepSeek R1-0528 version supports a massive 128K context, significantly improving reasoning and code generation capabilities while remaining free to use.
[AiBase summary:]
🌟 Supports 128K large context, significantly improves text recall accuracy, suitable for complex tasks.
💻 Optimized code generation and writing capabilities, fast and accurate output, comparable to top-tier models.
💰 Free access strategy reduces usage barriers, challenging traditional AI business models.
For more details: https://huggingface.co/deepseek-ai/DeepSeek-R1-0528
2. ByteDance releases image Agent "Little Lark AI" to create an all-in-one tool for viral content creation
ByteDance has launched a new image Agent called 'Little Lark AI', which can quickly generate high-quality videos and images with simple instructions, lowering the technical threshold for content creation.
[AiBase summary:]
🌟 Users only need one instruction, 'Little Lark AI' will actively think and generate viral videos and images, achieving 'inspiration at first sight'.
📚 Based on ByteDance's self-developed 'Lark' large model, it combines deep learning and multimodal technology, providing powerful image generation and video editing capabilities.
📱 Currently available on Android clients, iOS version expected to be released in June, promoting AI creation in broader application scenarios.
3. Keling 2.1 is officially launched: price drops by 65%, performance significantly improved
Keling 2.1 has been officially launched, with prices dropping by 65%, significantly increasing cost-effectiveness. Three new quality models have been added to meet different user needs. The generation effect is better than the previous version, faster speed, suitable for short videos and advertising production.
[AiBase summary:]
🌟 Keling 2.1 price drops by 65%, significantly increases cost-effectiveness.
⚡ Adds three models: standard edition, high-quality edition, and master edition to meet different user needs.
📈 Generation effect is better than the previous version, faster speed, suitable for short videos and advertising production.
4. Global first AI proxy browser Opera Neon released, leading the Web4.0 era with smart chat and automated tasks
Opera Neon, as the world's first proxy-type browser, redefines the web experience through AI-driven smart chat, task automation, and content creation functions.
[AiBase summary:]
🌐 Opera Neon is the world's first "fully proxy" browser that can proactively perform tasks such as search, form filling, and shopping, enhancing user efficiency.
💬 Built-in AI assistant Neon Chat supports multi-language interaction, extracting information from web pages and providing context-related answers, making interactions more natural.
💻 Neon Make generates games, websites, and more through simple commands, offering an end-to-end experience from creativity to finished products, unleashing creativity.
Details link: https://www.operaneon.com/
5. Meta releases Multi-SpatialMLLM: Leading a spatial understanding revolution in multimodal AI
Meta and The Chinese University of Hong Kong jointly developed the Multi-SpatialMLLM model, which enhances multimodal large language models' spatial understanding capability by integrating depth perception, visual correspondence, and dynamic perception components, performing excellently in multiple benchmark tests.
[AiBase summary:]
🌟 The Multi-SpatialMLLM model breaks through single-frame image analysis limitations through three components, enhancing spatial understanding capabilities.
📊 The new model uses the MultiSPA dataset and five tasks for training, significantly improving multi-frame spatial reasoning capabilities.
🏆 In multiple benchmark tests, the Multi-SpatialMLLM shows significantly higher accuracy, surpassing traditional models.
6. Tsinghua Lab and Peking University release ZeroSearch: Activating LLM retrieval capabilities, reducing costs by 88%
ZeroSearch is an innovative framework that activates the retrieval capabilities of large language models through simulated search engines, reducing training costs by up to 88% while improving the clarity of model reasoning and answer extraction efficiency.
[AiBase summary:]
✨ ZeroSearch uses large language models to generate retrieval documents without real searches, significantly reducing training costs and noise interference.
🔍 The framework adopts structured training templates and "simulated fine-tuning" strategies to improve document quality and model generalization capabilities.
🚀 Experiments show that ZeroSearch performs better than traditional methods, especially in large-scale models, advancing intelligent retrieval technology development.
Details link: https://arxiv.org/pdf/2505.04588
7. ByteDance launches new AI video editing app "Short Video Editor", simplifying life moments recording
ByteDance has launched a new app called "Short Video Editor", focusing on AI video editing, reducing the threshold for creation, and allowing users to easily produce high-quality videos.
[AiBase summary:]
🎥 ByteDance launches "Short Video Editor" application, helping users easily produce high-quality videos.
🤖 Integrated AI technology reduces the threshold for video creation, encouraging user sharing.
💡 Powered by DouBao's large model from Huoshan Engine, it improves video processing efficiency.
8. MotionPro makes a splash! AI video generation revolution is coming, 40ms per frame precise control, film and game industries are about to change
MotionPro is a precision motion controller designed specifically for image-to-video generation, achieving fine-grained control through regional trajectories and motion masks, bringing flexibility and precision to video generation.
[AiBase summary:]
✨ MotionPro solves the rough motion control problem in traditional I2V generation through regional trajectories and motion masks, achieving more natural and delicate effects.
🎥 Simultaneously controls object and camera movements, no specific dataset required, supporting precise generation of complex shots and object trajectories.
🌐 Open-source ecosystem support provides optimized training frameworks and data construction tools, helping developers get started quickly and drive industry progress.
Details link: https://huggingface.co/papers/2505.20287
9. Musk's xAI reaches $300 million cooperation agreement with Telegram, launching Grok AI chatbot
Telegram collaborates with xAI, which pays $300 million to deploy the Grok AI chatbot, enhancing Telegram user experience and increasing revenue.
[AiBase summary:]
Telegram collaborates with xAI, paying $300 million for deploying Grok AI chatbot.
Grok AI will enhance Telegram user communication experience by providing intelligent chat services.
The collaboration diversifies Telegram's profit model, driving the intelligence of social media.
10. OpenAI CFO reveals: Reorganization prepares for possible IPO in the future
OpenAI is restructuring its organization to prepare for a potential IPO, but the timing depends on market conditions. Microsoft has invested over $13 billion, and OpenAI transforms into a public interest company balancing shareholder returns and social responsibility.
[AiBase summary:]
🌟 OpenAI is restructuring its organization to pave the way for a future IPO, but the timing requires appropriate market conditions.
💰 Microsoft's investment exceeds $13 billion, and OpenAI transforms into a public interest company balancing shareholders and social responsibilities.
📈 Stability is key; an IPO requires full preparation from the company and a favorable market window.
11. Pixel Cake's 'Fangtang' large model successfully passes approval, becoming the first registered image large model in the domestic imaging industry
Pixel Cake's independently developed 'Fangtang' large model has passed the national Cyberspace Administration's approval, becoming the first image large model in the domestic imaging industry to receive official qualifications, marking both technological breakthroughs and compliance. It will promote the development of the industry in advertising and film sectors.
[AiBase summary:]
🌟 Fangtang large model successfully passes the national Cyberspace Administration's registration, becoming the first image large model with official qualifications in the domestic imaging industry.
🚀 Independent research and development demonstrate Pixel Cake's strength and innovation in the AI field, promoting the development of image generation technology.
🔒 Meets national policy requirements, ensuring safe and reliable user environments, setting a new benchmark in the industry.
12. Open Source + Low Cost! Paper2Poster turns academic papers into academic posters instantly
Paper2Poster is a tool that automates the conversion of academic papers into multimodal posters, significantly improving academic dissemination efficiency while reducing costs.
[AiBase summary:]
🌟 Core function: Automatically convert PDF papers into structurally clear and visually friendly academic posters, far exceeding traditional manual methods in efficiency.
💰 Open source and low cost: Generate a poster for just $0.005, open source characteristics reduce the threshold for using academic tools.
📊 Innovative evaluation mechanism: Releases 100 paper-poster pair datasets, promoting the standardization of multimodal content generation fields.
Details link: https://arxiv.org/abs/2505.21497
13. Resemble AI opens source TTS Chatterbox, performance rivals and surpasses ElevenLabs
Chatterbox is an open-source TTS model with outstanding performance and innovative features, including real-time synthesis, zero-shot voice cloning, and emotional exaggeration control, becoming a focus in the industry.
[AiBase summary:]
🌟 Chatterbox is based on a 0.5B scale LLaMA architecture with over 500,000 hours of training data, where 63.75% of listeners prefer its realism and fluency in blind tests.
⚡ Supports real-time synthesis with delays below 200ms, zero-shot voice cloning, and emotional exaggeration control functions, providing high flexibility for developers.
🔒 Open source characteristics lower the threshold, while embedding watermark technology ensures content traceability, showcasing a dual strategy of openness and commercialization.
Details link: https://github.com/resemble-ai/chatterbox