HomeAI Tutorial

GPTQModel

Public

Production ready LLM model compression/quantization toolkit with hw accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.

Erstellungszeit2024-06-17T14:45:30
Aktualisierungszeit2025-03-27T10:23:06
https://x.com/ModelCloudAI
922
Stars
7
Stars Increase

Verwandte Projekte