Welcome to the AI Daily section! This is your daily guide to exploring the world of artificial intelligence. Each day, we bring you the hottest topics in the AI field, focusing on developers, helping you understand technological trends and discover innovative AI product applications.

Fresh AI Products Click to Learn More: https://top.aibase.com/

1. Baidu's ERNIE Bot Platform Offers ERNIE 4.0 Version for Free

Baidu's ERNIE Bot Platform has recently released the ERNIE Large Model 4.0 version, which is now open to the public for free use, significantly expanding the platform's capabilities and application scope. This version has made significant strides in comprehension, generation, logical reasoning, and memory capabilities, rivaling GPT-4. Robin Li, founder of Baidu, emphasized the potential of bots as a key channel for AI applications at the World Artificial Intelligence Conference. The ERNIE Bot Platform's zero-code development model lowers the barrier for ordinary users to develop bots, providing developers with advantages in ease of development, distribution, and profitability.

image.png

AiBase Summary:

🚀 The ERNIE Large Model 4.0 version is open for free use by developers, significantly expanding its functionality and application scope.

💡 ERNIE 4.0 has made significant progress in comprehension, generation, logical reasoning, and memory capabilities, comparable to GPT-4.

💻 The ERNIE Bot Platform offers a zero-code development model, reducing the difficulty for ordinary users to develop bots and providing developers with advantages in ease of development, distribution, and profitability.

Details: https://top.aibase.com/tool/wenxinzhinengtipingtai-agentbuilder

2. Meta Launches AI Heavyweight: Multi-Token Prediction Model Now Open for Research

Meta has taken a significant step forward by releasing a pre-trained model that uses a multi-token prediction approach, potentially changing the way large language models are developed and deployed. This new technology is expected to improve AI efficiency and accelerate the trend of human-machine collaborative coding, having a more nuanced impact on language understanding and context.

AiBase Summary:

🚀 The new technology uses a multi-token prediction method, potentially improving performance and shortening training time.

💡 The model predicts multiple future words simultaneously, potentially enhancing language structure and contextual understanding.

🔗 Meta has released the model on Hugging Face, accelerating innovation and talent acquisition, and contributing to the competitive landscape in the AI field.

Details: https://top.aibase.com/tool/multi-token-prediction

3. SenseTime Releases "Daily New 5o": Real-Time Streaming Multimodal Interaction on Par with GPT-4o

SenseTime unveiled the "Daily New 5o" model at the 2024 World Artificial Intelligence Conference, the first "what you see is what you get" model in China, achieving real-time streaming multimodal interaction comparable to GPT-4o. This model integrates cross-modal information such as sound, text, images, and videos, enabling real-time understanding and response.

image.png

AiBase Summary:

🚀 The "Daily New 5o" model achieves real-time streaming multimodal interaction, capable of recognizing badges, describing the appearance of a puppy toy, and evaluating drawings.

💡 "Daily New 5.5" is an upgrade to "Daily New 5.0," with a 30% improvement in comprehensive performance, particularly in mathematical reasoning, English capabilities, and instruction following.

🔑 SenseTime has launched the "Zero Cost for Large Models" plan, offering free services to enterprise users, providing token packages, and offering migration consultants to help users transition with zero service costs.

4. Shanghai AI Lab Open-Sources Powerful Multimodal LLM InternLM-XComposer-2.5

Yesterday, the Shanghai AI Laboratory open-sourced a multimodal large language model named InternLM-XComposer-2.5, demonstrating exceptional capabilities in ultra-high-resolution image understanding, fine-grained video understanding, and multi-round image dialogue. The model has been specially optimized for web design and mixed-media articles, filling a gap in the domestic multimodal LLM field and providing creators with greater creative space.

AiBase Summary:

⚙️ Long-context processing: IXC-2.5 supports handling ultra-long text and image inputs, natively supporting 24K tokens and expandable to 96K, providing users with a larger creative space.

👁️ Diverse visual capabilities: IXC-2.5 not only supports ultra-high-resolution image understanding but also fine-grained video understanding and multi-round multi-image dialogue, showcasing unimaginable abilities.

✨ Generation capabilities: IXC-2.5 can generate web pages and high-quality mixed-media articles, elevating the combination of text and images to a new level.

Project link: https://top.aibase.com/tool/internlm-xcomposer-2-5

Full content: https://www.aibase.com/news/10053

5. Stanford University's OccFusion: High-Fidelity Rendering of Occluded Humans

OccFusion is a new method proposed by Stanford University aimed at achieving high-fidelity rendering of occluded humans. The method involves three stages of processing, utilizing efficient 3D Gaussian splatting and 2D diffusion model supervision, and has performed excellently in evaluations, reaching the latest level in occluded human rendering.

image.png

AiBase Summary:

🌟 OccFusion is a new method aimed at achieving high-fidelity rendering of occluded humans.

🌟 The method includes three stages: initialization, optimization, and refinement, achieved through efficient 3D Gaussian splatting and 2D diffusion model supervision.

🌟 Evaluated on ZJU-MoCap and OcMotion sequences, OccFusion has performed excellently, reaching the latest level in occluded human rendering.

Details: https://top.aibase.com/tool/occfusion

6. Apple Opens 4M Model Demonstration: Easily Deconstructs All Information in Images

Apple has dropped a重磅炸弹 on Hugging Face, opening the demonstration of last year's 4M model paper. This model can process and generate multi-modal content, including text, images, and 3D scenes. By uploading a photo, users can easily obtain all information about the photo, such as the main contours, tones, and dimensions. Apple has demonstrated its strong AI capabilities and hopes to build an ecosystem around 4M, but also faces challenges in data practices and AI ethics.

QQ截图20240705100442.jpg

AiBase Summary:

🔍 The 4M model can process and generate multi-modal content, including text, images, and 3D scenes.

🛠️ 4M uses a "large-scale multi-modal masking modeling" training method to achieve seamless integration between modalities.

💡 4M uses the world's largest open-source dataset CC12M and a weakly supervised pseudo-labeling method, proving its ability to directly perform multi-modal tasks.

Details: https://huggingface.co/spaces/EPFL-VILAB/4M

7. China's Generative AI Patent Count Exceeds US by 6 Times

China has achieved significant milestones in the field of generative AI, with its patent count exceeding that of the US by six times, demonstrating strong innovative capabilities and a leading position. Companies like Tencent, Ping An Insurance Group, and Baidu stand out in the number of GenAI patents. China's top academic institutions and technological ecosystem provide strong support for the development of generative AI, gaining recognition from the academic community and media.

image.png

AiBase Summary:

🔸 China's generative AI patent count from 2014 to 2023 reached 38,210, exceeding that of the US by six times.

🔸 Tencent, Ping An Insurance Group, and Baidu are the top Chinese companies in terms of GenAI patent count.

🔸 China's top academic institutions and technological ecosystem provide strong support for the development of generative AI, with China's leading position in the field recognized by the academic community and media.

Details: https://www.wipo.int/web-publications/patent-landscape-report-generative-artificial-intelligence-genai/index.html

8. Magical LivePortrait: Turn Photos into Vivid Videos with Precise Control Over Eye and Lip Movements!

LivePortrait is a cutting-edge technology that brings static photos to life, overcoming traditional animation production challenges with high efficiency and precision. It generates realistic animations, controls eye and lip movements, and enhances users' creative space. Let your photos come alive and tell their own stories.

AiBase Summary:

🎨 LivePortrait transforms static photos into smooth dynamic videos, revolutionizing traditional animation production with seamless multi-person portrait processing and natural flow.

⚡ LivePortrait addresses traditional animation production challenges with high quality and efficiency, precisely controlling eye and lip movements, and realistic micro-expressions.

🔗 LivePortrait employs advanced technical methods, with fast generation speeds, supporting multiple styles of portraits, and providing more creative space.

Details: https://top.aibase.com/tool/liveportrait

9. Highlights from WAIC Opening Day | What Insights Did AI Industry Leaders Share?

On July 4th, the 2024 World Artificial Intelligence Conference and the High-Level Meeting on Global AI Governance were held in Shanghai, where AI industry experts discussed the development direction and application of AI. The conference reflected the AI industry's shift towards practical applications, focusing on how AI technology can generate real value. Issues such as AI safety and ethics, industrial transformation, and opportunities also became focal points of discussion.

AiBase Summary:

🔍 AI application implementation has become a focal point, with the focus shifting towards practical applications, and how AI technology can generate real value becoming a key concern.

🚀 The emphasis of AI development is shifting towards practical applications, with SenseTime's CEO highlighting applications as the key to driving AI into a "super moment," requiring high-quality data, smooth interaction, and controllability.

⚖️ AI safety and ethical issues are receiving attention, with AI risks mainly stemming from expanded cyber risks, social structure disruption, and existential risks, requiring a balance between controlling AI and leveraging its potential.

10. Sci-Fi Becomes Reality? Clone Robotics: A Company Creating Westworld-like Bionic Robots

Clone Robotics is a company that manufactures bionic robots, advancing technology through bionic design and biomechanical principles, with products featuring high fidelity, durability, and affordability. Their core products include the Clone Hand and Clone Torso, capable of performing various complex operational tasks, offering a wide range of application scenarios. The company represents a future lifestyle of harmonious coexistence between humans and robots.

AiBase Summary: