dynamic-batching

Public

The official repo for the paper "Optimizing LLM Inference Throughput via Memory-aware and SLA-constrained Dynamic Batching"

inference-serving llm vllm

Creat：2025-03-06T15:23:39

Update：2025-03-17T16:16:07

Stars

Stars Increase

Related projects

Dify

Hot

agent

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.

107097

3个月前

+127today

Gpt4all

ai-chat

GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.

73821

3个月前

+9today

Browser Use

Hot

ai-agents

Make websites accessible for AI agents

65563

4个月前

+65today

LLMs From Scratch

Hot

Implement a ChatGPT-like LLM in PyTorch from scratch, step by step

59023

10个月前

+70today

MetaGPT

agent

? The Multi-Agent Framework: First AI Software Company, Towards Natural Language Programming

57240

3个月前

+25today

LLaMA Factory

Hot

agent

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

54325

3个月前

+67today

Vllm

Hot

amd

A high-throughput and memory-efficient inference and serving engine for LLMs

52326

1年前

+81today

Autogen

Hot

agentic

A programming framework for agentic AI ? PyPi: autogen-agentchat Discord: https://aka.ms/autogen-discord Office Hour: https://aka.ms/autogen-officehour

47315

1年前

+60today