MInference
Accelerate the inference process of long context large language models
PremiumNewProductProgramming\Large Language Models\\Inference Acceleration\
MInference is an inference acceleration framework for long context large language models (LLMs). It leverages the dynamic sparse characteristics of LLMs' attention mechanisms, significantly enhancing the speed of pre-filling through static pattern recognition and online sparse index approximation computation. It achieves a tenfold increase in processing 1M contexts on a single A100 GPU while maintaining inference accuracy.
MInference Visit Over Time
Monthly Visits
513197610
Bounce Rate
36.07%
Page per Visit
6.1
Visit Duration
00:06:32