llm-quantization-playground
PublicA hands-on demo project that compares multiple quantization methods for LLMs, including FP16, INT8, and 4-bit (GPTQ, AWQ, GGML, bitsandbytes). The goal is to understand real-world tradeoffs between model size, latency, throughput, GPU memory usage, and output quality.