Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present the hottest content in the AI field, focusing on developers, helping you understand technical trends and innovative AI product applications.

Discover the latest AI products click to learn more: https://top.aibase.com/

1. Baidu Announces ERNIE Speed and ERNIE Lite Models Are Now Completely Free

Baidu Cloud announces the free release of ERNIE Speed and ERNIE Lite, two leading models. ERNIE Speed is a high-performance large language model suitable for fine-tuning specific scenario problems, while ERNIE Lite is a lightweight large language model suitable for low-computing AI accelerator card inference.

[AiBase Summary:]

🚀 ERNIE Speed, Baidu's latest self-developed high-performance large language model released in 2024, excels in general capabilities.

💡 ERNIE Lite, Baidu's self-developed lightweight large language model, balances excellent model performance and inference capabilities.

💻 ERNIE Speed and ERNIE Lite are now completely free, effective immediately.

2. Alibaba Cloud Announces a 97% Price Drop for the GPT-4 Level Qwen-Long Model's API Input

Alibaba Cloud announces a significant reduction in the API input price for its GPT-4 level Qwen-Long model, bringing great discounts and competitiveness to users, making text processing capabilities more affordable.

[AiBase Summary:]

🚀 API input price reduced to 0.0005 yuan per thousand tokens, a 97% decrease, with users able to purchase 2 million tokens for just 1 yuan.

💡 The model supports text input of up to 10 million tokens, with a price about 1/400 that of GPT-4, making it one of the most competitive globally.

🌍 The Tongyi model serves more than 90,000 enterprises through Alibaba Cloud and more than 2.2 million enterprises through DingTalk, widely used by small and medium-sized enterprises and developers at home and abroad.

3. ByteDance Releases Pricing List for the Doubao Large Model: Starting from 25 Yuan

This article introduces the updated pricing information for the Doubao large model on the Volcano Engine official website. The Doubao large model has an advantage in cost-effectiveness, with the main model price reduced by 99%, and outstanding performance-to-price ratio. The Doubao large model family includes various members to meet different user needs. The billing model is flexible, with postpaid and prepaid methods to meet enterprise needs.

[AiBase Summary:]

🔍 Doubao large model price updated, starting at just 25 yuan, with a clear advantage in cost-effectiveness.

💡 The Doubao large model family is diverse, including Pro, Lite, and other models, with powerful processing capabilities.

💰 Flexible billing model, with postpaid and prepaid methods to meet enterprise needs.

4. Zhipu AI Releases the New Generation Multimodal Model CogVLM2

Zhipu AI recently launched the new generation multimodal model CogVLM2, with significant performance improvements, supporting 8K text length and 1344*1344 resolution images. CogVLM2 performs excellently on multiple benchmarks, demonstrating strong document image understanding capabilities. The technical architecture has been optimized, with a model size of 19B, performance approaching or exceeding GPT-4V levels. The actual activated parameters during inference are about 12 billion, with significantly improved inference efficiency.

image.png

[AiBase Summary:]

🚀 CogVLM2 performance improved by 32% on the OCRbench benchmark and by 21.9% on the TextVQA benchmark.

💡 CogVLM2 adopts a deep fusion strategy, closely integrating visual and language modalities, maintaining language processing advantages.

🔥 CogVLM2 achieves excellent results on multiple multimodal benchmarks, including TextVQA, DocVQA, ChartQA, etc.

Details link: https://github.com/THUDM/CogVLM2

5. FaceThink Introduces the Latest Generation On-device Multimodal Model MiniCPM-Llama3-V2.5

FaceThink's latest generation on-device multimodal model MiniCPM-Llama3-V2.5 boasts exceptional comprehensive performance, achieving SOTA results in OCR, supporting multiple languages, and realizing system-level multimodal acceleration on-device, demonstrating powerful multimodal capabilities, bringing new breakthroughs to the development of on-device AI models.

image.png

[AiBase Summary:]

🚀 MiniCPM-Llama3-V2.5 boasts exceptional comprehensive performance, surpassing Gemini Pro and GPT-4V.

🔍 Achieves SOTA results in OCR, accurately identifying difficult images, long images, and long text.

💡 First to achieve system-level multimodal acceleration on-device, with image encoding speed increased by 150 times.

Details link: https://github.com/OpenBMB/MiniCPM-VMiniCPM

6. Tencent Plans to Invest in Moon's Shadow with a Valuation of Up to $3 Billion

Tencent plans to invest in Moon's Shadow, potentially raising its valuation to $3 billion. This move demonstrates Tencent's strategic layout and competitive intentions in the field of artificial intelligence. The rapid development of China's artificial intelligence industry has made investment and competition increasingly fierce, with Tencent's cooperation with Moon's Shadow drawing attention in the industry.

[AiBase Summary:]

🚀 Moon's Shadow leads in the field of artificial intelligence large language models, attracting attention from giants like Tencent.

💰 Moon's Shadow, founded for just over a year, has completed over $1 billion in financing, with a valuation of $2.5 billion.

📈 Tencent will join the competition in the artificial intelligence field, planning to invest in other large model startups, strengthening its competitive power.

7. Line Preprocessor Anyline Adds Web UI Adaptation

This article introduces the latest update to Anyline, adding adaptation for the Web UI Controlnet, enhancing the user experience. Chenlei Hu plans to further simplify the use of Anyline and consider deeper integration into ComfyUI. Users can choose the appropriate base model according to their needs to achieve the best results. The update brings the powerful functions of Anyline to the Web UI, bringing convenience to professional design work and daily image processing.

image.png

[AiBase Summary:]

🔍 High-precision line extraction: Anyline can accurately extract object edges, details, and text content from images, outputting clear edges and high-fidelity text line drawings.

🌐 Wide applicability: Users can input any type of image, and Anyline can quickly process it, providing high-quality line drawings.

🔬 Advantages in texture font recognition: Anyline has obvious advantages in contour accuracy, object detail, material texture, and font recognition, while providing better noise reduction effects.

Details link: https://top.aibase.com/tool/anyline

8. Meta Releases the GPT-4o-like Multimodal Model Chameleon

This article introduces Meta's recently released multimodal model Chameleon, which sets a new benchmark in the field of multimodal models, featuring early fusion, unified Transformer architecture, and other innovative characteristics. Chameleon demonstrates extensive capabilities on various tasks, including visual question answering, image captioning, text generation, etc. The article also mentions the technical challenges Chameleon faces and the architectural innovations and training techniques introduced by the Meta team.

image.png

[AiBase Summary:]

🌟 Chameleon is an early fusion token-based hybrid modality model family capable of understanding and generating images and text in any order.

🔑 Chameleon faces significant technical challenges, and Meta's research team has introduced a series of architectural innovations and training techniques.

💡 Chameleon model comprehensively outperforms Llama2 on benchmark evaluations, achieving significant results in common sense reasoning, reading comprehension, mathematical problems, and world knowledge.

Details link: https://arxiv.org/pdf/2405.09818Chameleon

9. Microsoft Releases AI Tool Recall to Help You Find Those Elusive Files

Microsoft has released Recall AI, providing a "photo memory" feature for the Copilot+ PC series, allowing users to query files, websites, or emails through voice and search with AI. This feature can help users find the information they need more easily, and data remains on the device locally, so there's no need to worry about information being transmitted to cloud servers.

image.png

[AiBase Summary:]

🔍 Recall AI provides a "photo memory" feature for the Copilot+ PC series, allowing users to query files, websites, or emails through voice and search with AI.

🔍 Recall AI records users' screen operations, using natural language to describe file memory, quickly finding the latest version of documents, making it easier for users to access information.

🔍 Recall AI is only available for devices with specific hardware requirements, such as ARM64 processors, Snapdragon X Elite, and X Plus, etc.

10. From Scratch, Reproducing the Llama3 Code Repository Goes Viral, with Karpathy Praising the Author as a Man of Taste

This article introduces a code repository that teaches you how to implement Llama3 from scratch, which has gone viral on the internet. Renowned AI expert Andrej Karpathy highly praises the project, commending the author Nishant Aklecha's detailed explanations and demonstrations. The article explains the implementation process of the Llama3 model in detail, including key content such as the attention mechanism and positional encoding.

[AiBase Summary:]

🔥 The code repository goes viral, attracting countless developers' attention, with Karpathy liking, forwarding, and commenting

👨‍💻 Author Nishant Aklecha explains the implementation process of the Llama3 model in detail, including the attention mechanism and positional encoding

🚀 Na Ge implements Llama3 from scratch, demonstrating the function of each line of code, with Karpathy praising the detailed and easy-to-understand presentation

Details link: https://top.aibase.com/tool/llama3-from-scratch

11. AI Framework Ambient Diffusion: Drawing Inspiration from Images, Not Copying

The research team from the University of Texas at Austin has developed the Ambient Diffusion framework, which trains on unrecognizable images to bypass the issue of AI models copying others' works. This framework is not only useful in the art field but also has potential applications in science and medicine, such as black hole imaging and MRI scans. The team's innovation provides new ideas for the development of artificial intelligence.

image.png

[AiBase Summary:]

🔍 Ambient Diffusion framework solves the problem of AI models copying works by training on scrambled image data.

💡 The framework has great potential and can be applied in the art, science, and medicine fields, such as black hole imaging and MRI scans.

📝 Preliminary experiments show that the Ambient Diffusion framework can still generate high-quality samples without the need to recognize the content of the original source images.

Details link: https://arxiv.org/abs/2305.19256

12. Hollywood Actress Scarlett Criticizes OpenAI for Copying Her Voice for ChatGPT

Scarlett Johansson claims that OpenAI imitated her unique tone after she refused to provide her voice for ChatGPT. OpenAI demonstrated a synthetic voice strikingly similar to Scarlett Johansson's AI assistant character in "Her," but suddenly disabled this new voice. Critics have criticized OpenAI's actions and praised Scarlett Johansson's stance.

[AiBase Summary:]

⭐ Scarlett Johansson claims OpenAI imitated her tone without permission.

⭐ OpenAI demonstrated a synthetic voice similar to Scarlett Johansson's AI assistant character in "Her."

⭐ Critics criticize OpenAI's actions and praise Scarlett Johansson's stance.

13. Intel Releases the New Generation Lunar Lake Chip

Intel plans to release the Lunar Lake notebook processor in the third quarter of this year, aiming to bring a new AI experience to the Copilot Plus PC. The chip will provide three times the AI performance of its predecessor, the Meteor Lake, and is expected to have over 400,000 Lunar Lake chips on board by the end of the year. This move is a significant step for Intel in the AI PC market, aiming to counter the challenges from competitors.

[AiBase Summary:]

⭐ The Lunar Lake chip will use a CPU, integrated Xe2 GPU, and neural processing unit (NPU), providing three times the AI performance of its predecessor, the Meteor Lake.

⭐ Intel plans to have over 400,000 Lunar Lake chips on board in more than 80 new notebook models worldwide by the end of the year, to counter AMD's Zen5 and Qualcomm's Oryon.

⭐ The NPU of the Lunar Lake processor will be able to perform over 40 trillion operations per second (TOPS), far exceeding the 10 TOPS of the Meteor Lake chip, bringing more powerful performance and richer application experiences to AI PCs.