AIbase
Product LibraryTool NavigationMCP

Differential-Transformer-PyTorch

Public

PyTorch implementation of the Differential-Transformer architecture for sequence modeling, specifically tailored as a decoder-only model similar to large language models (LLMs). The architecture incorporates a novel Differential Attention mechanism, Multi-Head structure, RMSNorm, and SwiGLU.

Creat2024-10-08T21:48:40
Update2025-03-21T17:35:01
70
Stars
0
Stars Increase