Don't miss any moment of global AI innovation
Daily three-minute AI industry trends
AI industry milestones
Lists all AI hardware products.
AI monetization case sharing
AI image creation monetization cases
AI video creation monetization cases
AI audio creation monetization cases
AI content writing monetization cases
Free sharing of the latest AI tutorials
Shows total visits ranking of AI websites
Track fastest growing AI websites by traffic
Focus on AI websites with significant traffic drops
Shows weekly visits ranking of AI websites
AI websites most popular with US users
AI websites most popular with Chinese users
AI websites most popular with Indian users
AI websites most popular with Brazilian users
Total visits ranking of AI image generation websites
Total visits ranking of AI personal assistant websites
Total visits ranking of AI character generation websites
Total visits ranking of AI video generation websites
GitHub popular AI projects by total stars
GitHub popular AI projects by growth rate
GitHub popular AI developer ranking
GitHub popular AI organization ranking
GitHub popular deepseek open source projects
GitHub popular TTS open source projects
GitHub popular LLM open source projects
GitHub popular ChatGPT open source projects
Overview of GitHub popular AI open source projects
This is the official repo of "QuickLLaMA: Query-aware Inference Acceleration for Large Language Models"
Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.
Unified KV Cache Compression Methods for Auto-Regressive Models
SpargeAttention: A training-free sparse attention that can accelerate any model inference.
Flux diffusion model implementation using quantized fp8 matmul & remaining layers use faster half precision accumulate, which is ~2x faster on consumer devices.
[NeurIPS 2024] AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising
Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)
?? A fast and flexible PyTorch inference server that runs locally, on any cloud or AI HW.
[ICLR 2025] Palu: Compressing KV-Cache with Low-Rank Projection
xKV: Cross-Layer SVD for KV-Cache Compression
Demo code for CVPR2023 paper "Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient Vision Transformers"