Information

Latest AI News

Explore AI Frontiers, Master Industry Trends

AI Daily Brief

Your Daily AI Brief - Never Miss What's Next

Information

AI Product Finder

Smart Product Discovery - Comprehensive Market Intelligence

AI Product Rankings

AI Product Power Rankings - Performance, Buzz & Trends

AI Product Submit

Submit Your AI Product - Amplify Reach & Drive Growth

Tools

AI Tools Directory

Discover The Best AI Websites & Tools

Information

AI Models Finder

Comprehensive AI Models Collection for All Your Development & Research Needs

LLM Leaderboard

AI LLM Power Rankings - Performance, Buzz & Trends

Model Providers

Discover Trusted AI Model Partners - Guaranteed Reliable Support

Submit Your Model

Submit Your Model Info & Services - Precision Marketing & User Targeting

Tools

Compare LLMs

Multi-Dimensional Large Model Comparison - Find Your Perfect Match

LLM Cost Calculator

Calculate AI Model Costs Accurately - Optimize Your Budget

LLM Arena

Multi-Model Real-Time Evaluation & Quick Output Comparison

Information

MCP Servers

Discover Popular AI-MCP Services - Find Your Perfect Match Instantly

MCP Client

Easy MCP Client Integration - Access Powerful AI Capabilities

MCP Case Tutorials

Master MCP Usage - From Beginner to Expert

MCP Ranking

Top MCP Service Performance Rankings - Find Your Best Choice

MCP Service Submission

Publish & Promote Your MCP Services

Tools

MCP Playground

Test MCP Services Freely - Quick Online Experience

MCP Inspector

Quick MCP Service Testing - Fast Deployment

GEO Services

Achieve Dominant Visibility in AI Search for Your Business or Brand with GEO Services

AI Search Visibility Checker

Detect brand's visibility on AI platforms

Tools

AI Model Compatibility Checker

Free PC Hardware Test for DeepSeek & Llama

Information

AI Dataset Collection

Large-scale datasets and benchmarks for training, evaluating, and testing models to measure

Tools

Intelligent Document Recognition

Comprehensive Text Extraction and Document Processing Solutions for Users

AI Tutorial

Meta Super Intelligence Lab Breaks RAG Technology Bottleneck: The REFRAG Framework Boosts Inference Speed by 30 Times

AIbase基地

Published inAI News · 5 min read · Oct 14, 2025

The efficiency revolution of large language models is underway. The Meta Super Intelligence Lab recently introduced a breakthrough technology that enhances the reasoning speed of large language models in retrieval-augmented generation tasks by more than 30 times. This innovative achievement is detailed in a paper titled "REFRAG: Rethinking RAG based Decoding," bringing profound changes to the way AI models operate.

The Meta Super Intelligence Lab was established in June this year in Menlo Park, California. The lab was born out of dissatisfaction from Meta's CEO, Mark Zuckerberg, with the performance of the company's newly released Llama4 model. He demanded the team to accelerate development and even required employees to work overtime to push technological progress. This sense of urgency led to the establishment of the lab and attracted many top talents to join.

In the lab's operational structure, the research team is divided into four groups, each focusing on the development of large language models, fundamental research, product technology applications, and infrastructure support. The release of the REFRAG framework marks an important step forward for the lab in optimizing the performance of large language models.

The core innovation of the REFRAG framework lies in using a lightweight model to compress lengthy context content into concise summaries, thereby reducing the amount of information the decoder needs to process. This approach not only significantly speeds up processing but also reduces computational complexity, enhancing the overall efficiency of the model. The research team also adopted a continuous pre-training strategy, training the model through reconstruction tasks to retain as much detail of key information as possible while compressing information.

After comprehensive testing, REFRAG has shown excellent performance in multiple tasks, especially in terms of time delay and data throughput. Experimental data shows that when the compression ratio reaches 16 times, REFRAG not only surpasses the previous state-of-the-art model CEPE in speed but also maintains almost no loss in accuracy. This breakthrough opens up new possibilities for future AI applications.

Retrieval-augmented generation technology is a key method for improving the quality and accuracy of answers from large language models. It enhances model output by retrieving relevant information from external knowledge bases. However, the main bottleneck of traditional RAG methods is the computational burden when processing a large amount of retrieved content. REFRAG solves this pain point through intelligent compression, significantly improving operational efficiency while maintaining model performance.

The significance of this technology goes beyond speed improvements; it paves the way for the practical application of large language models. Faster reasoning speed means lower operational costs and better user experience, which is crucial for AI application scenarios requiring real-time responses. As Meta continues to advance in the field of intelligent technology, the introduction of the REFRAG framework will greatly promote the adoption of large language models in practical applications, filling us with anticipation for future intelligent applications.

Large Language Models Meta Super Intelligence Lab REFRAG Llama4

This article is from AIbase Daily

Welcome to the [AI Daily] column! This is your daily guide to exploring the world of artificial intelligence. Every day, we present you with hot topics in the AI field, focusing on developers, helping you understand technical trends, and learning about innovative AI product applications.

—— Created by the AIbase Daily Team

AI News Recommendations

TikTok and LV-NUS jointly launch the SAIL-VL2 model: Small size but big performance!

SAIL-VL2, a multimodal model by TikTok SAIL and LV-NUS Lab, achieves breakthroughs on 106 datasets with compact 2B/8B parameters, outperforming peers in complex reasoning tasks like MMMU and MathVista, rivaling large closed models.....

Oct 14, 2025

110

India will co-host the Global AI Summit with Anthropic in 2026

Anthropic will co-host the Global AI Summit with the Indian government in February 2026. The summit aims to bring together top experts, scholars, and business leaders to share trends in AI technology and applications, covering areas such as machine learning and natural language processing, and is expected to attract participants from many countries.

Oct 14, 2025

Google Invests $9 Billion in South Carolina to Drive the Future of Artificial Intelligence

Google invests $9B more in South Carolina to expand data centers for AI infrastructure, boosting digital economy. Follows $3.3B prior investment, creating jobs.....

Oct 14, 2025

Tencent Launches Youtu-Embedding: Empowering Enterprise-Level Intelligent Services

Tencent Ugli Lab open-sources the Youtu-Embedding text representation model, enhancing the efficiency of enterprise intelligent customer service and knowledge management. The model avoids generating misleading answers in specific fields by extracting accurate information, solving the problem of irrelevant responses caused by general corpora, and effectively addressing the issue of poor cross-domain performance.

Oct 14, 2025

New Breakthrough in Agricultural Intelligence! China Agricultural University Launches Shennong Model 3.0

China Agricultural University launched Shennong Model 3.0, covering agricultural disciplines and application scenarios across the country, promoting agricultural AI to a new stage. The model focuses on 36 agricultural agents, achieving the goals of small size, high intelligence, and low cost. It provides three versions: 32B, 7B, and 1B. It uses dynamic sparsity and incremental compression technology, reducing computing power by 50%.

Oct 14, 2025

HKU and Meituan Collaborate to Solve AI Math Challenges: CodePlot-CoT Enables Large Models to Think with Code Plotting, Performance Surges by 21%

Large language models like GPT-4.1 and Gemini-2.5-Pro struggle with geometry problems due to limited spatial visualization, despite strong text-based reasoning.....

Oct 14, 2025

Apple to Make an Appearance at the 2025 International Conference on Computer Vision

Apple will present eight papers at the International Conference on Computer Vision (ICCV), which will be held in Honolulu in October 2025. The focus will be on cutting-edge technologies such as multimodal models and video generation, sharing the company's latest research achievements in the field of computer vision.

Oct 14, 2025

100

Apple Launches New FS-DFM Model, AI Long Text Writing Efficiency Improved by 128 Times!

Apple and The Ohio State University jointly launched the FS-DFM model, which can generate long text comparable to traditional models after only 8 iterations, achieving a writing speed improvement of up to 128 times, breaking through the efficiency bottleneck of long text generation. The model uses discrete flow matching technology, different from self-regressive models like ChatGPT that generate text character by character.

Oct 14, 2025

100

Tencent Focuses on AI Talent Development: Qingyun Scholarship First Phase Funds 15 Outstanding Master's and PhD Students, Providing Scarcity Computing Resources

Tencent Launches the Qingyun Scholarship, focusing on basic research and application innovation in artificial intelligence, targeting outstanding master's and PhD students in computer science, AI, and related interdisciplinary fields. The first phase selected 15 people, each receiving a total support value of 500,000 yuan, including cash and computing resources, aiming to promote academic research and innovation.

Oct 14, 2025

Baidu World 2025 Will Be Held on November 13: Focusing on Large Model Technology, AI-Native Applications, and Globalization Strategy

Baidu will host its World Conference in Beijing on Nov 13, 2025, focusing on deep learning models, AI-native app ecosystems, and global expansion. This event marks a key milestone in Baidu's next decade, showcasing AI breakthroughs and market strategies.....

Oct 14, 2025

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Submit Your Model

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Services​

AI Search Visibility Checker

AI Model Compatibility Checker

AI Dataset Collection

Intelligent Document Recognition

Meta Super Intelligence Lab Breaks RAG Technology Bottleneck: The REFRAG Framework Boosts Inference Speed by 30 Times

AIbase基地

This article is from AIbase Daily

AI News Recommendations

TikTok and LV-NUS jointly launch the SAIL-VL2 model: Small size but big performance!

India will co-host the Global AI Summit with Anthropic in 2026

Google Invests $9 Billion in South Carolina to Drive the Future of Artificial Intelligence

Tencent Launches Youtu-Embedding: Empowering Enterprise-Level Intelligent Services

New Breakthrough in Agricultural Intelligence! China Agricultural University Launches Shennong Model 3.0

HKU and Meituan Collaborate to Solve AI Math Challenges: CodePlot-CoT Enables Large Models to Think with Code Plotting, Performance Surges by 21%

Apple to Make an Appearance at the 2025 International Conference on Computer Vision

Apple Launches New FS-DFM Model, AI Long Text Writing Efficiency Improved by 128 Times!

Tencent Focuses on AI Talent Development: Qingyun Scholarship First Phase Funds 15 Outstanding Master's and PhD Students, Providing Scarcity Computing Resources

Baidu World 2025 Will Be Held on November 13: Focusing on Large Model Technology, AI-Native Applications, and Globalization Strategy

GEO Services