AI News

Don't miss any moment of global AI innovation

AI Daily

Daily three-minute AI industry trends

AI Timeline

AI industry milestones

Al Hardware

Lists all AI hardware products.

AI Monetization Guide

Latest Cases

AI monetization case sharing

Image Collection

AI image creation monetization cases

Video Collection

AI video creation monetization cases

Audio Collection

AI audio creation monetization cases

Content Collection

AI content writing monetization cases

AI Tutorials

Latest Tutorials

Free sharing of the latest AI tutorials

AI Product Rankings

AI Product Ranking

Shows total visits ranking of AI websites

AI Traffic Growth Ranking

Track fastest growing AI websites by traffic

AI Traffic Decline Ranking

Focus on AI websites with significant traffic drops

AI Weekly Ranking

Shows weekly visits ranking of AI websites

Popular Country Rankings

United States

AI websites most popular with US users

China

AI websites most popular with Chinese users

India

AI websites most popular with Indian users

Brazil

AI websites most popular with Brazilian users

Popular Category Rankings

Image Generation

Total visits ranking of AI image generation websites

Personal Assistant

Total visits ranking of AI personal assistant websites

Character Generation

Total visits ranking of AI character generation websites

Video Generation

Total visits ranking of AI video generation websites

Popular Open Source Data Rankings

AI Project Ranking

GitHub popular AI projects by total stars

AI Project Growth Ranking

GitHub popular AI projects by growth rate

AI Developer Ranking

GitHub popular AI developer ranking

AI Organization Ranking

GitHub popular AI organization ranking

Popular Open Source Categories

Deepseek

GitHub popular deepseek open source projects

TTS

GitHub popular TTS open source projects

LLM

GitHub popular LLM open source projects

ChatGPT

GitHub popular ChatGPT open source projects

AI Open Source Project Library

Overview

Overview of GitHub popular AI open source projects

Product Library Tool Navigation

A Card Runs Large Models, Performance Reaches 80% of 4090, Price Only Half: Produced by Chen Tianqi's TVM Team

微信公众平台

Published inAI News · 1 min read · Aug 10, 2023

The article introduces the solution proposed by the Chen Tianqi TVM team for performing large-scale model inference using AMD graphics cards. Through optimization methods, the performance of the AMD Radeon RX 7900 XTX can reach 80% of NVIDIA's RTX 4090. The author also introduces MLC-LLM, a tool that provides high-performance generic deployment, making AMD GPUs competitive in large language model inference. The article points out that it is possible to address hardware shortages through software improvements and optimizations, and provides specific implementation schemes and performance test results.

AMD GPU Large Model Inference MLC-LLM

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

AMD GPU Performance Leap! Significant Stable Diffusion Model Optimization

AMD's advancements in AI are noteworthy, particularly its latest optimizations for the Stable Diffusion model. Recently, Stability AI released an ONNX-optimized version of Stable Diffusion, resulting in significantly improved performance for AMD Radeon GPUs and Ryzen integrated graphics in AI tasks, with speed increases up to 3.8 times faster. This progress narrows the gap with NVIDIA's ecosystem...

Apr 18, 2025

700

Tsinghua Team Open-Sources Chitu Inference Engine to Boost Domestic AI Ecosystem

Recently, Professor Zhai Jidong's team from the Institute of High Performance Computing at Tsinghua University, in collaboration with Tsinghua-affiliated innovative enterprise, Qingcheng Extreme Intelligence, announced the open-sourcing of Chitu, a groundbreaking large model inference engine. This innovative technology marks another significant breakthrough for China in the AI field, particularly in inference engine development. A core highlight of the Chitu engine is its ability to natively run FP8 precision models on non-Nvidia Hopper architecture GPUs and various domestic chips.

Mar 15, 2025

600

ByteDance's UltraMem Architecture Reduces Large Model Inference Costs by 83%

The ByteDance Doubao large model team announced today the successful development of a new sparse model architecture called UltraMem. This architecture effectively addresses the high memory access issues during the inference of MoE (Mixture of Experts) models, improving inference speed by 2 to 6 times compared to MoE, and reducing inference costs by up to 83%. This groundbreaking advancement opens a new path for efficient inference of large models. The UltraMem architecture successfully resolves the memory bottleneck during inference of MoE architectures while maintaining model performance. Experimental results show that the parameters and activation conditions are the same.

Feb 12, 2025

5.5k

The Innovative Open Source Framework OpenR Effectively Enhances Large Model Inference Capabilities

Oct 14, 2024

2.0k

Apple AI Research Team Discovers Limitations of Large Model Inference, Rendering OpenAI's o1 Ineffective with Just One Sentence

In the world of artificial intelligence, the reasoning capabilities of machine learning models, particularly large language models (LLMs), have been a focal point for scientists. Recently, Apple's AI research team published a paper titled 'Understanding the Limitations of Large Language Models in Mathematical Reasoning,' shedding light on these models' limitations when addressing logical problems. In the paper, researchers demonstrate this through a simple mathematical question. They first posed a question about Oliver picking kiwis: as follows: Oliver picked 44 on Friday.

Oct 12, 2024

3.2k

Baidu Wenxin Large Model Inference Cost Reduced to 1% of Original, Daily Call Volume Exceeds 50 Million

The inference cost of Baidu Wenxin large model has been reduced to 1% of the original. The daily call volume of the Wenxin large model exceeds 50 million, with a quarterly growth rate of 190%. Approximately 26,000 enterprises are using the Wenxin large model, with a quarterly growth rate of 150%.

Feb 28, 2024

490

Groq Launches Large Model Inference Chip, Exceeding GPU Speeds at 500 Tokens Per Second

Groq has introduced a large model inference chip that outpaces GPUs and Google TPUs with a speed of 500 tokens per second. The chip, developed by the Google TPU team, achieves a tenfold increase in inference speed while reducing costs to one-tenth, supporting various large models. Groq aims to surpass Nvidia within three years by employing its custom LPU solution to achieve inference performance 18 times faster than cloud platforms. The price of Groq's large model inference chip is over $20,000, featuring extremely fast API access speeds and open-source LLM model support.

Feb 20, 2024

1.5k

The domestic open-source project SwiftInfer has achieved infinite streaming input inference, successfully improving large model inference performance by 46%

["The domestic open-source project SwiftInfer has achieved infinite streaming input inference, successfully improving large model inference performance by 46%", "SwiftInfer employs techniques such as attention sink mechanism, window attention optimization, and KV Cache optimization to successfully implement infinite streaming input inference, resulting in a 46% increase in inference throughput", "The improvement in inference performance by SwiftInfer released by the Colossal-AI team is evident for large models."]

Jan 8, 2024

570

Shanghai Jiao Tong University's IPADS Lab Launches PowerInfer Framework, Achieving 11x Speedup in Large Model Inference

["The IPADS Lab at Shanghai Jiao Tong University has released the PowerInfer framework, achieving an 11-fold speedup in real-time inference using 80GA100 without quantization.", "PowerInfer can smoothly run 30-40 billion parameter LLMs on consumer-grade hardware like the 24G4090, while the 2080Ti can handle 70 billion parameter inference.", "Using FP16 precision, PowerInfer enables high-speed inference on personal computers through mixed precision computation, addressing the operational bottlenecks of large models.", "Sparse activation technology is a key component of PowerInfer."]

Dec 21, 2023

4.5k

Peking University Jointly Releases EAGLE: A 3x Improvement in Large Model Inference Efficiency

EAGLE is jointly released by the University of Waterloo, the Vector Institute in Canada, Peking University, and other institutions. It aims to solve the costly and slow process of text generation in large language models. By extrapolating the second-level feature vectors of large language models, EAGLE achieves a lossless improvement in inference efficiency, being 3 times faster than standard autoregressive decoding, providing an efficient solution for large-scale text generation tasks. Utilizing speculative sampling methods combined with a lightweight autoregressive head and a frozen classification head, EAGLE leverages the contextual features extracted by large language models for extrapolation.

Dec 14, 2023

780