HomeAI Tutorial

llm-quantization-playground

Public

A hands-on demo project that compares multiple quantization methods for LLMs, including FP16, INT8, and 4-bit (GPTQ, AWQ, GGML, bitsandbytes). The goal is to understand real-world tradeoffs between model size, latency, throughput, GPU memory usage, and output quality.

Creat2025-11-21T12:04:36
Update2025-12-04T12:07:33
0
Stars
0
Stars Increase