Welcome to the 【AI Daily】 column! This is your guide to exploring the world of artificial intelligence every day. Each day, we present the latest hot topics in the AI field, focusing on developers and helping you gain insight into technical trends and understand innovative AI product applications.
Fresh AI products click to learn more: https://top.aibase.com/
1. Kuaishou Launches AI Image Tool Poify, Focusing on E-commerce Market
Kuaishou has recently launched the AI image tool Poify, which focuses on image processing for the e-commerce market with the aim of improving merchants' efficiency and cost-effectiveness in product display. The core functions of Poify include text-to-image and image-to-image generation, especially suitable for e-commerce needs, offering AI model try-ons and background replacement capabilities, helping merchants reduce costs while enhancing visual attractiveness.
【AiBase Summary:】
🛍️ Poify focuses on the e-commerce sector, providing efficient AI image solutions that meet various merchant needs.
📸 Merchants can easily generate high-quality product display images through features like AI model try-ons, reducing traditional shooting costs.
🚀 Kuaishou aims to seize the opportunity at the intersection of e-commerce and AI with Poify, driving further industry development.
2. ByteDance Releases Open Source Code Model Seed-Coder with 8B Parameters, Leading a New Trend in Programming
The Seed team from ByteDance has released the new open-source code model Seed-Coder, which quickly gained attention in the industry due to its 8B parameters and excellent code generation and reasoning capabilities. Seed-Coder excels in multiple benchmark tests, showcasing strong programming potential. Its innovative data processing methods and efficient training strategies not only improve code generation quality but also provide new ideas for future AI-driven data processing.
【AiBase Summary:】
💻 Seed-Coder is an open-source code model with 8B parameters and supports 32K context, focusing on code generation and software engineering tasks.
🔍 Automatically curating and filtering code data using small language models significantly reduces manual intervention, enhancing data screening efficiency.
🏆 In multiple benchmark tests, Seed-Coder demonstrates outstanding code repair and generation capabilities, becoming a leading lightweight programming model.
Details link: https://github.com/ByteDance-Seed/Seed-Coder
3. Top Ten IPs for 2025 Unveiled, Including DeepSeek App
The 2025 World IP Economy Development Conference and Global IP Licensing Expo was successfully held in Guangzhou, attracting the attention of many experts and industry insiders. The expo selected the top ten IPs from 2,368 participating works. After expert review and online voting, ten outstanding works were finally determined. Among them, "Nezha's Magic Sea Monster" stood out with its excellent storyline and exquisite production, becoming one of the top ten IPs.
【AiBase Summary:】
🎉 This expo attracted 2,368 participating IPs and selected the top ten after expert review and online voting.
🌟 "Nezha's Magic Sea Monster" became one of the top ten IPs due to its excellent storyline and production quality.
🎭 Works such as DeepSeek App and the musical drama "Summoned by Dunhuang" showcased the diversity of Chinese cultural creativity.
4. Claude AI API Introduces New Web Search Functionality
Anthropic's newly released Claude AI API introduces web search functionality, enabling real-time access to network information. This innovation significantly enhances Claude's accuracy when answering questions and puts pressure on traditional search engines. Developers can use this feature to build more precise intelligent agents applicable across financial, legal, developer tools, and productivity domains.
【AiBase Summary:】
🌐 Claude AI API introduces web search functionality, enabling real-time access to network information.
💼 Provides four application scenarios including finance, law, developer tools, and productivity.
📈 This new feature facilitates the creation of precise intelligent agents by developers, enhancing competitiveness.
5. Apple Releases FastVLM Model, an Extremely Fast Visual-Language Model That Can Run on iPhone
Apple officially launched FastVLM, a visual-language model optimized for high-resolution image processing with extremely fast encoding speed and excellent performance, particularly suitable for running on mobile devices. The core of FastVLM is its innovative FastViTHD encoder, which significantly improves efficiency through dynamic resolution adjustment and hierarchical token compression technologies.
【AiBase Summary:】
🚀 FastVLM achieves an 85x increase in encoding speed through the FastViTHD encoder, optimizing high-resolution image processing.
📈 In multi-modal tasks, FastVLM performs exceptionally well, particularly standing out in SeedBench and TextVQA benchmarks.
🌐 The open-source nature of FastVLM will attract developers to participate, driving Apple's technological innovation and ecosystem construction in the visual-language model field.
Details link: https://github.com/apple/ml-fastvlm/
6. Tencent Releases New AI Framework PrimitiveAnything: Revolutionizing 3D Shape Generation!
PrimitiveAnything is a revolutionary framework jointly developed by Tencent and Tsinghua University, aiming to redefine the abstraction and generation of 3D shapes. By decomposing complex shapes into primitive components, the framework not only enhances geometric accuracy but also boosts learning efficiency. Its auto-regressive generation method and large-scale HumanPrim dataset verify the framework's superior performance in reconstruction accuracy and consistency with human abstract patterns, demonstrating strong generalization capabilities, particularly suitable for efficient interactive 3D applications.
【AiBase Summary:】
🛠️ The PrimitiveAnything framework generates variable-length primitive component sequences via decoder transformers, enhancing 3D shape generation's geometric accuracy and learning efficiency.
📊 The research team built a large-scale HumanPrim dataset to validate the framework's superior performance in reconstruction accuracy and consistency with human abstract patterns.
💻 The framework supports generating 3D content from text or image inputs, allowing users to easily edit results for high modeling quality and storage savings.
Details link: https://huggingface.co/spaces/hyz317/PrimitiveAnything
7. First Intelligent Document Processing Benchmark Released: Gemini Leads but Lacks Strengths; Multimodal AI Faces Real Challenges
On May 11, the intelligent document processing field reached an important milestone with the official launch of the first unified benchmark test for vision-language models, IDP Leaderboard. This benchmark comprehensively analyzes the performance of mainstream models across multiple core tasks based on evaluations of 9,229 documents and 16 datasets. Although Gemini2.5Flash excels in overall strength, it unexpectedly underperforms in OCR and classification tasks, revealing trade-offs between multimodal reasoning capabilities and basic text recognition functionalities.
【AiBase Summary:】
📈 IDP Leaderboard evaluates the performance of mainstream models across six core tasks based on 16 datasets and 9,229 documents.
🤖 Gemini2.5Flash leads in overall strength but sees unexpected declines in OCR and classification tasks compared to its predecessor, highlighting balance issues in model iterations.
📝 Long document processing and table extraction remain shortfalls for vision-language models; the best models have yet to break the 70% mark in these tasks.
Details link: https://github.com/nanonets/idp-leaderboard
8. Google Breaks Boundaries Again: Gemini 2.5 Pro Achieves 6-Hour Video Understanding, AI Visual Capabilities Enter a New Era
Google's Gemini 2.5 Pro model achieved a major breakthrough in video understanding, supporting up to 6 hours of video analysis and a context window of up to 2 million tokens. By parsing YouTube links via API, the model performed excellently in the VideoMME benchmark test, with an accuracy rate nearing the industry's top level. Its applications span education, creative industries, and business analysis, showcasing a new era in AI visual capabilities.
【AiBase Summary:】
🎥 Gemini 2.5 Pro supports up to 6 hours of video analysis with a context window of 2 million tokens, achieving the first API-based YouTube link parsing.
📊 In the VideoMME benchmark test, the model's accuracy reached 84.7%, just 0.5% behind the industry's top level.
💡 This model can be applied in education, creative industries, and business analysis, automatically generating reports and interactive learning applications to enhance user experience.
9. User Questioning Style Affects AI Model Accuracy, Concise Answers Easily Lead to Misinformation
Recent studies show that when users request brief answers, many language models are more likely to generate incorrect or misleading information. This research reveals the negative impact of concise requests on model accuracy, particularly when users use confident wording, which significantly reduces the model's correction ability. This phenomenon varies significantly across different models, with smaller models being more affected.
【AiBase Summary:】
📉 Brief requests lead to a decline in language model accuracy, with fantasy resistance potentially decreasing by up to 20%.
🗣️ Users' tone and wording affect the model's correction ability; the flattery effect may make models less willing to challenge misinformation.
🔍 Different models perform differently under realistic conditions, with smaller models being more susceptible to brief and confident wording.
10. Global First AI Smart Browser Fellou Launched: One-Click Research, Posting, Emailing, Efficiency Soars Fivefold!
The release of Fellou marks a significant transformation in browsers, as it becomes the world's first browser with AI-powered automation capabilities. Not only can it perform traditional searches and browsing, but it can also think, plan, and execute complex tasks, greatly boosting user productivity. Through deep research mode and workflow automation, Fellou provides powerful support for researchers, marketers, and developers, particularly showcasing immense potential in cross-platform collaboration and data processing.
【AiBase Summary:】
🔍 Deep research mode automatically generates complete reports by parallel searching multiple platforms in the background, rivaling the efficiency of an intern team.
⚙️ Deep workflow mode allows users to automate complex tasks through natural language instructions, boosting efficiency and supporting cross-platform operations.
🔒 Privacy protection-wise, Fellou promises not to track user behavior; all data processing occurs locally, ensuring user data security.
Details link: https://fellou.ai
11. NVIDIA AI Introduces Audio-SDS, Revolutionizing Sound Effect Generation and Multitask Audio Processing
NVIDIA's Audio-SDS technology extends Score Distillation Sampling (SDS) to the audio domain, significantly improving sound effect generation and sound source separation capabilities. The technology supports multitask audio processing, enabling users to generate customized sound effects via text prompts, reducing development costs and time. The open-source release of Audio-SDS provides new possibilities for the creative industry and smart devices, marking an important milestone in AI audio processing.
【AiBase Summary:】
🎶 Audio-SDS leverages SDS technology extended to the audio domain, enabling multitask processing suitable for sound effect generation and sound source separation.
📝 Users can customize sound design through text conditions, meeting creative and industrial needs to enhance user experience.
🚀 The open-source strategy promotes AI technology popularization, providing low-cost audio processing solutions for developers and small and medium-sized enterprises.
Details link: https://research.nvidia.com/labs/toronto-ai/Audio-SDS/
12. Kimi Joins Xiaohongshu, AI Large Models Transition from 'Traffic Wars' to Content Deepening
Kimi's cooperation with Xiaohongshu marks a new attempt for AI large models on content platforms. Although the current entrance has yet to deeply integrate with other Xiaohongshu functions, this cooperation shows Kimi's transformation strategy amid traffic anxiety. In the future, Kimi may enhance user stickiness by combining content with communities, although current functionalities remain cautious. Further cooperation between the two parties still requires observation.
【AiBase Summary:】
📈 Kimi cooperates with Xiaohongshu, launching the Kimi intelligent assistant account where users can generate notes with one click.
💰 Kimi's traffic budget was reduced to 150 million yuan in the first quarter of 2025, indicating its transition from quantity-driven growth to a focus on content and community strategies.
🔍 Kimi also collaborates with Caixin Media to introduce financial data, exploring the direction of trustworthy replies, further reaching content communities.