FlexLLMGen
PublicRunning large language models on a single GPU for throughput-oriented scenarios.
Creat:2023-02-16T05:18:53
Update:2025-03-26T15:38:10
9.4K
Stars
1
Stars Increase
Running large language models on a single GPU for throughput-oriented scenarios.