HomeAI Tutorial

vLLM-efficient-serving-stack

Public

Production-grade vLLM serving with an OpenAI-compatible API, per-request LoRA routing, KEDA autoscaling on Prometheus metrics, Grafana/OTel observability, and a benchmark comparing AWQ vs GPTQ vs GGUF.

Creat2025-08-31T02:19:15
Update2025-09-03T01:10:57
0
Stars
0
Stars Increase

Related projects