Optimizing-LLM-Inference-using-NVIDIA-Dynamo-and-TorchDynamo

Public

The goal of the project is to benchmark and optimize BERT inference using different backends—PyTorch eager mode, TorchDynamo (Inductor backend), and NVIDIA Triton Inference Server. We use GLUE SST-2 samples for evaluation and compare performance through profiling, kernel timing, and latency analysis.

bert high-performance-computing hpml llm-inference machine-learning machine-learning-algorithms nvidia-dynamo nvidia-gpu profiling pytorch

Creat：2025-05-11T04:00:04

Update：2025-05-11T07:19:55

Stars

Stars Increase

Related projects

Transformers

bert

? Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

146087

1年前

+10today

Gin

framework

Gin is a HTTP web framework written in Go (Golang). It features a Martini-like API with much better performance -- up to 40 times faster. If you need smashing performance, get yourself some Gin.

82889

3个月前

+10today

Cs Video Courses

algorithms

List of Computer Science courses with video lectures.

69199

3个月前

+7today

Awesome Design Patterns

architecture

A curated list of software and architecture related design patterns.

43124

5个月前

+11today

ColossalAI

Making large AI models cheaper, faster and more accessible

41001

3个月前

+1today

Pake

chatgpt

?? Turn any webpage into a desktop app with Rust. ?? 利用 Rust 轻松构建轻量级多端桌面应用

39889

3个月前

+10today

Docker_practice

book

Learn and understand Docker&Container technologies, with real DevOps practice!

25457

3个月前

+1today

Pulumi

aws

Pulumi - Infrastructure as Code in any programming language ?

23345

3个月前

+10today

Ncnn

android

ncnn is a high-performance neural network inference framework optimized for the mobile platform

21685

3个月前

+5today

Haystack

AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.

21286

3个月前

+1today

AI News

AI Daily

AI Timeline

Al Hardware

Latest Cases

Image Collection

Video Collection

Audio Collection

Content Collection

Latest Tutorials

AI Product Ranking

AI Traffic Growth Ranking

AI Traffic Decline Ranking

AI Weekly Ranking

United States

China

India

Brazil

Image Generation

Personal Assistant

Character Generation

Video Generation

AI Project Ranking

AI Project Growth Ranking

AI Developer Ranking

AI Organization Ranking

Deepseek

TTS

LLM

ChatGPT

Overview

Optimizing-LLM-Inference-using-NVIDIA-Dynamo-and-TorchDynamo

Related projects

Transformers

Gin

Cs Video Courses

Awesome Design Patterns

ColossalAI

Pake

Docker_practice

Pulumi

Ncnn

Haystack