SageAttention

Public

Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.

attention cuda efficient-attention inference-acceleration llm mlsys quantization triton video-generation

Hora de creación：2024-10-03T17:33:18

Hora de actualización：2025-03-27T08:33:32

1.9K

Stars

Stars Increase

Proyectos relacionados

Annotated_deep_learning_paper_implementations

attention

??? 60+ Implementations/tutorials of deep learning papers with side-by-side notes ?; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), gans(cyclegan, stylegan2, ...), ? reinforcement learning (ppo, dqn), capsnet, distillation, ... ?

61676

3个月前

+43today

Vllm

Hot

amd

A high-throughput and memory-efficient inference and serving engine for LLMs

51608

1年前

+173today

Vit Pytorch

artificial-intelligence

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

23303

3个月前

+15today

Numpy Ml

attention

Machine learning, in numpy

16117

3个月前

Sglang

Hot

cuda

SGLang is a fast serving framework for large language models and vision language models.

15768

3个月前

+63today

Leedl Tutorial

bert

《李宏毅深度学习教程》（李宏毅老师推荐?，苹果书?），PDF下载地址：https://github.com/datawhalechina/leedl-tutorial/releases

15384

3个月前

+10today

Kaldi

c-plus-plus

kaldi-asr/kaldi is the official location of the Kaldi project.

14958

3个月前

+5today

Nlp Tutorial

attention

Natural Language Processing Tutorial for Deep Learning Researchers

14661

10个月前

+2today

RWKV LM

attention-mechanism

RWKV (pronounced RwaKuv) is an RNN with great LLM performance, which can also be directly trained like a GPT transformer (parallelizable). We are at RWKV-7 "Goose". So it's combining the best of RNN and transformer - great performance, linear time, constant space (no kv-cache), fast training, infinite ctx_len, and free sentence embedding.

13763

3个月前

+5today