Efficiently-Serving-LLMs
PublicLearn the ins and outs of efficiently serving Large Language Models (LLMs). Dive into optimization techniques, including KV caching and Low Rank Adapters (LoRA), and gain hands-on experience with Predibase’s LoRAX framework inference server.
batch-processingdeep-learning-techniquesinference-optimizationlarge-scale-deploymentmachine-learning-operationsmodel-accelerationmodel-inference-servicemodel-servingoptimization-techniquesperformance-enhancement
Creat:2024-03-28T00:36:49
Update:2024-06-20T01:15:50
https://www.deeplearning.ai/short-courses/efficiently-serving-llms/
17
Stars
0
Stars Increase