SageAttention

Public

Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.

attention cuda efficient-attention inference-acceleration llm mlsys quantization triton video-generation

Erstellungszeit：2024-10-03T17:33:18

Aktualisierungszeit：2025-03-27T08:33:32

2.8K

Stars

Stars Increase

Verwandte Projekte

Vllm

Hot

amd

A high-throughput and memory-efficient inference and serving engine for LLMs

64909

1年前

+236today

Annotated_deep_learning_paper_implementations

Hot

attention

??? 60+ Implementations/tutorials of deep learning papers with side-by-side notes ?; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), gans(cyclegan, stylegan2, ...), ? reinforcement learning (ppo, dqn), capsnet, distillation, ... ?

64723

8个月前

+52today

Vit Pytorch

artificial-intelligence

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

24612

8个月前

+24today

Sglang

Hot

cuda

SGLang is a fast serving framework for large language models and vision language models.

21068

8个月前

+144today

Numpy Ml

attention

Machine learning, in numpy

16210

8个月前

+1today

Leedl Tutorial

bert

《李宏毅深度学习教程》（李宏毅老师推荐?，苹果书?），PDF下载地址：https://github.com/datawhalechina/leedl-tutorial/releases

16083

8个月前

+11today

Kaldi

c-plus-plus

kaldi-asr/kaldi is the official location of the Kaldi project.

15257

8个月前

+3today

Nlp Tutorial

attention

Natural Language Processing Tutorial for Deep Learning Researchers

14800

1年前

+3today

RWKV LM

attention-mechanism

RWKV (pronounced RwaKuv) is an RNN with great LLM performance, which can also be directly trained like a GPT transformer (parallelizable). We are at RWKV-7 "Goose". So it's combining the best of RNN and transformer - great performance, linear time, constant space (no kv-cache), fast training, infinite ctx_len, and free sentence embedding.

14211

8个月前

+11today

Open3D

Open3D: A Modern Library for 3D Data Processing

13085

8个月前

+11today

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

AI Brand Monitoring Tool

AI Search Visibility Checker

GEO Services

AI Model Compatibility Checker

AI Deployment Calculator

SageAttention

Verwandte Projekte

Vllm

Annotated_deep_learning_paper_implementations

Vit Pytorch

Sglang

Numpy Ml

Leedl Tutorial

Kaldi

Nlp Tutorial

RWKV LM

Open3D

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

AI Brand Monitoring Tool

AI Search Visibility Checker

GEO Services​

AI Model Compatibility Checker

AI Deployment Calculator

SageAttention

Verwandte Projekte

Vllm

Annotated_deep_learning_paper_implementations

Vit Pytorch

Sglang

Numpy Ml

Leedl Tutorial

Kaldi

Nlp Tutorial

RWKV LM

Open3D

GEO Services