JetStream
PublicJetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).
Creat:2024-03-01T08:24:07
Update:2025-03-27T03:13:37
364
Stars
0
Stars Increase
JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).