MInference is an inference acceleration framework for long context large language models (LLMs). It leverages the dynamic sparse characteristics of LLMs' attention mechanisms, significantly enhancing the speed of pre-filling through static pattern recognition and online sparse index approximation computation. It achieves a tenfold increase in processing 1M contexts on a single A100 GPU while maintaining inference accuracy.