Product Finder

Product Submit

AI Models Finder

MCP Servers

MCP Client

MCP Inspector

Case Tutorials

Latest AI News

AI Daily Brief

MInference

Accelerate the inference process of long context large language models

PremiumNewProductProgramming\Large Language Models\\Inference Acceleration\

Visit

MInference is an inference acceleration framework for long context large language models (LLMs). It leverages the dynamic sparse characteristics of LLMs' attention mechanisms, significantly enhancing the speed of pre-filling through static pattern recognition and online sparse index approximation computation. It achieves a tenfold increase in processing 1M contexts on a single A100 GPU while maintaining inference accuracy.

Visit

MInference Visit Over Time

Monthly Visits

513197610

Bounce Rate

36.07%

Page per Visit

6.1

Visit Duration

00:06:32

MInference Visit Trend

MInference Visit Geography

Product Finder

Product Submit

AI Models Finder

MCP Servers

MCP Client

MCP Inspector

Case Tutorials

Latest AI News

AI Daily Brief

MInference

MInference Visit Over Time

MInference Visit Trend

MInference Visit Geography

MInference Traffic Sources

MInference Alternatives

MInference — Accelerate the inference process of long context large language models

BiTA — Bidirectional Adjustment for Large Language Models

FP6-LLM — Efficiently serving large language models

BitNet — Inference framework for 1-bit large language models

Star-Attention — EfficientInference Technology for Long Sequence Large Language Models

Models Table — A comprehensive list and information about large language models

Large World Models — Large World Models: Understanding Video and Language

Doubao-1.5-pro — Doubao-1.5-pro is a high-performance sparse Mixture of Experts (MoE) large language model that focuses on achieving an optimal balance between inference performance and model capability.

Buffer of Thoughts — Improves the accuracy and efficiency of large language models in reasoning

MInference 1.0 — Accelerates long-context pre-fill processing for large language models

LLM Compiler-7b — An advanced large language model for code optimization and compiler inference.

LLM Maybe LongLM — Extends the context window of large language models

T-MAC — Acceleration of low-bit large language model inference on CPU.

Benchmarking API Performance of Large Language Models — In-depth analysis of key metrics like TTFT and TPS

redcache-ai — A dynamic memory framework that supports large language models and agents.

Aphrodite Engine — PygmalionAI's large-scale inference engine

Prompt Engineering Guide — A comprehensive guide to prompt engineering for large language models

Open LLM Leaderboard — A publicly accessible leaderboard of large language models.

Brainglue — Brainglue is an interesting experimental platform for large language models

AutoDAN-Turbo — An automated framework for breaking the limitations of large language models

DCLM — Comprehensive framework for building and training large language models

CuMo — An advanced architecture for extending multimodal large language models (LLMs).

VSP-LLM — A framework that combines Visual Speech Processing with Large Language Models

InternVL2-8B-MPO — Multimodal large language model, enhancing multimodal inference capabilities.

MNN Large Model Android App — A fully functional Android app supporting multimodal capabilities with a large language model.

Entry Point AI — A platform for training customized large language models

RoleLLM — Role-playing framework for large language models

xLAM — Research on intelligent agents based on large language models

parsera — A lightweight Python library for web scraping using large language models.

OpenCompass 2.0 Large Language Model Leaderboard — A real-time large language model leaderboard that provides comprehensive performance assessments.