Home
Information

AI Dataset Collection

Large-scale datasets and benchmarks for training, evaluating, and testing models to measure

Tools

Intelligent Document Recognition

Comprehensive Text Extraction and Document Processing Solutions for Users

AI Tutorial

Optimizing-LLM-Inference-using-NVIDIA-Dynamo-and-TorchDynamo

Public

The goal of the project is to benchmark and optimize BERT inference using different backends—PyTorch eager mode, TorchDynamo (Inductor backend), and NVIDIA Triton Inference Server. We use GLUE SST-2 samples for evaluation and compare performance through profiling, kernel timing, and latency analysis.

Creat2025-05-11T04:00:04
Update2025-05-11T07:19:55
1
Stars
0
Stars Increase