Welcome to the "AI Daily" section! This is your guide to exploring the world of artificial intelligence every day. Each day, we present you with the latest content in the AI field, focusing on developers to help you understand technical trends and learn about innovative AI product applications.

New AI products click to learn more:https://app.aibase.com/zh

1. Google Gemini 3.0 Pro begins limited release: Enhanced reasoning capabilities, official release may be at the end of this month

The DeepMind team at Google has started to push the Gemini 3.0 Pro model to some users. This model has improved reasoning capabilities and multi-modal processing, and is planned to be officially released by the end of October.

image.png

【AiBase summary:】

🧠 Gemini 3.0 Pro introduces the Deep Think reasoning architecture, enhancing the ability to handle multi-step complex tasks.

🌐 Supports various input formats such as text, images, audio, and video, and can generate complete front-end code.

🚀 Google plans to launch a lightweight Flash variant version to meet the needs of mobile devices and edge computing.

2. Baidu launches the globally leading document parsing model PaddleOCR-VL, reshaping the OCR technology landscape!

Baidu's PaddleOCR-VL model performs well in the field of document parsing, becoming a new benchmark in OCR technology thanks to its lightweight efficiency, multilingual support, and high-precision recognition capabilities.

image.png

【AiBase summary:】

🌍 Supports 109 languages, suitable for various document processing tasks.

⚙️ Core parameters are only 0.9B, achieving efficient computation and accurate recognition.

🚀 The inference speed is significantly improved, outperforming other mainstream models.

3. AiShi Technology completes a B+ round of financing worth 100 million yuan: ARR exceeds 40 million USD, users exceed 100 million

AiShi Technology has made significant progress in the field of AI video generation, completing a B+ round of financing worth 100 million RMB, and achieving milestones such as exceeding 40 million USD in ARR and over 100 million registered users. Its product strategy and technological innovations provide strong competitiveness in the market.

image.png

【AiBase summary:】

🚀 AiShi Technology completed a B+ round of financing worth 100 million yuan, showing recognition and support from the capital market.

📈 Annual recurring revenue (ARR) exceeded 40 million USD, with user numbers exceeding 100 million.

💡 Continuous technological innovation, PixVerse V5 version improves generation efficiency and video quality, introducing Agent creation assistant features.

4. Anthropic launches 'skills' feature for Claude, enhancing AI work efficiency

Anthropic launched a new feature called 'skills' for Claude AI, aiming to enhance the practicality of AI in work scenarios. This feature provides instructions, scripts, and resources in the form of folders, allowing Claude to process specific tasks more efficiently, such as Excel documents or brand guidelines. Users can also create custom skills and use them across multiple platforms. This feature resonates with OpenAI's AgentKit, marking a step forward in the practical application of AI in the industry.

image.png

【AiBase summary:】

🌟 Anthropic launched the 'skills' feature for Claude, enhancing the practicality of AI in work scenarios.

🛠️ Users can create custom skills to better adapt Claude to specific work environments.

🚀 This move aligns with OpenAI's new features like AgentKit, indicating that the AI industry continues to move toward practical applications.

5. Pinterest launches an AI content restriction tool: Users can customize to reduce generated AI images

Pinterest launched a new content control tool that allows users to limit the proportion of AI-generated content in their feed in response to user dissatisfaction. The platform tries to balance AI innovation with user experience by introducing AI modification tags and providing optional settings for users.

image.png

【AiBase summary:】

🖼️ Users can customize the display ratio of generated AI images.

🤖 Pinterest introduced AI modification tags to identify AI-generated content.

🌐 Pinterest seeks a compromise between AI technology and user experience.

6. Fully open-source LLaVA-OneVision-1.5, a multimodal model surpassing Qwen2.5-VL, makes its debut

LLaVA-OneVision-1.5 is an open-source multimodal model capable of processing various inputs such as images and videos, and it performs well in multiple benchmark tests, surpassing the Qwen2.5-VL model.

image.png

【AiBase summary:】

🧠 LLaVA-OneVision-1.5 is a new multimodal model that can process various input forms such as images and videos.

📈 The training process is divided into three stages, aiming to efficiently improve the model's visual and language understanding capabilities.

🏆 In benchmark tests, LLaVA-OneVision-1.5 performed excellently, surpassing the Qwen2.5-VL model.

Details link: https://github.com/EvolvingLMMs-Lab/LLaVA-OneVision-1.5 https://huggingface.co/lmms-lab/LLaVA-OneVision-1.5-8B-Instruct

7. OpenAI video generation model Sora 2 goes live on Microsoft Azure: Pricing $0.1 per second, enters public preview phase

Microsoft announced that the Sora 2 video generation model from OpenAI has been launched on the Azure AI Foundry international version, entering the public preview phase, marking the commercialization of generative AI video tools.

image.png

【AiBase summary:】

🎥 Sora2 is a multimodal video generation model that supports text, image, and video input to generate new video content.

💰 Pricing is $0.1 per second, using a billing model based on generation duration, suitable for enterprise users to use in bulk.

🌐 Sora2 is available only on the Azure AI Foundry international version, and Chinese users cannot directly access it for now.

8. Travel search engine Kayak launches “AI mode” for easier travel planning and booking

Kayak launched a new "AI mode" that helps users research, plan, and book trips through an integrated chatbot. This feature uses ChatGPT technology to provide more context-aware search results and supports open-ended questions to get travel advice.

image.png

【AiBase summary:】

🌍 Kayak launched the "AI mode," allowing users to conveniently plan and book trips through a chatbot.

🗣️ This feature supports asking for travel advice and comparing various travel services, providing accurate information using ChatGPT technology.

📅 The "AI mode" initially supports only English, and will expand to more languages and platforms later, along with adding voice request features.