SageAttention

Public

Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.

attention cuda efficient-attention inference-acceleration llm mlsys quantization triton video-generation

Creat：2024-10-03T17:33:18

Update：2025-03-27T08:33:32

2.8K

Stars

Stars Increase

Related projects

Vllm

Hot

amd

A high-throughput and memory-efficient inference and serving engine for LLMs

64909

2年前

+236today

Annotated_deep_learning_paper_implementations

Hot

attention

??? 60+ Implementations/tutorials of deep learning papers with side-by-side notes ?; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), gans(cyclegan, stylegan2, ...), ? reinforcement learning (ppo, dqn), capsnet, distillation, ... ?

64723

1年前

+52today

Vit Pytorch

artificial-intelligence

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

24612

1年前

+24today

Sglang

Hot

cuda

SGLang is a fast serving framework for large language models and vision language models.

21068

1年前

+144today

Numpy Ml

attention

Machine learning, in numpy

16210

1年前

+1today

Leedl Tutorial

bert

《李宏毅深度学习教程》（李宏毅老师推荐?，苹果书?），PDF下载地址：https://github.com/datawhalechina/leedl-tutorial/releases

16083

1年前

+11today

Kaldi

c-plus-plus

kaldi-asr/kaldi is the official location of the Kaldi project.

15257

1年前

+3today

Nlp Tutorial

attention

Natural Language Processing Tutorial for Deep Learning Researchers

14800

1年前

+3today

RWKV LM

attention-mechanism

RWKV (pronounced RwaKuv) is an RNN with great LLM performance, which can also be directly trained like a GPT transformer (parallelizable). We are at RWKV-7 "Goose". So it's combining the best of RNN and transformer - great performance, linear time, constant space (no kv-cache), fast training, infinite ctx_len, and free sentence embedding.

14211

1年前

+11today

Open3D

Open3D: A Modern Library for 3D Data Processing

13085

1年前

+11today

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

GEO Brand Visibility

AI Visibility Audit

AI Search Visibility Checker

GEO Ranking Monitor

AI Conversation Insight

GEO Promotion Link Detection

Website AI Friendliness Detection

GEO Ranking Optimization System

GEO Ranking Optimization

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

LLM API Hub

AI Models Finder

Model Providers

LLM Leaderboard

LLM API Proxy Checker

Compare LLMs

LLM Cost Calculator

LLM Arena

AI Model Compatibility Checker

AI Deployment Calculator

SageAttention

Related projects

Vllm

Annotated_deep_learning_paper_implementations

Vit Pytorch

Sglang

Numpy Ml

Leedl Tutorial

Kaldi

Nlp Tutorial

RWKV LM

Open3D