AIbase
Product LibraryTool NavigationMCP

Optimizing-LLM-Inference-using-NVIDIA-Dynamo-and-TorchDynamo

Public

The goal of the project is to benchmark and optimize BERT inference using different backends—PyTorch eager mode, TorchDynamo (Inductor backend), and NVIDIA Triton Inference Server. We use GLUE SST-2 samples for evaluation and compare performance through profiling, kernel timing, and latency analysis.

Creat2025-05-11T04:00:04
Update2025-05-11T07:19:55
0
Stars
0
Stars Increase

Related projects