PyTorch 2.8 Launches with a Major Boost: Quantum LLM Inference Performance Surges, Intel GPU Support Arrives!

AIbase基地

Published inAI News · 4 min read · Aug 8, 2025

Recently, the open-source machine learning framework PyTorch has officially released its new version 2.8. This release has attracted a lot of attention, mainly focusing on improving the inference performance of quantized large language models (LLMs), especially on Intel CPUs. This update not only significantly enhances inference efficiency in offline mode, but also introduces experimental support for Intel GPU distributed backends for the first time.

In PyTorch 2.8, developers have improved algorithms and introduced new technologies, greatly increasing the inference speed of quantized LLMs. Specifically, this version supports multiple quantization modes, including A16W8, DA8W8, and A16W4. Test data shows that when running the Llama-3.1-8B model with M=8, K, and 32 cores on Intel's 6th generation Xeon platform, end-to-end latency is reduced by more than 20%, and performance can even rival some popular LLM service frameworks.

Another highlight of this update is the experimental support for XCCL distributed backend on Intel discrete GPUs in PyTorch 2.8. This feature provides more flexibility for different training modes, allowing developers to leverage the potential of models in a wider range of hardware environments.

Aside from these core enhancements, PyTorch 2.8 also includes a series of important improvements. For example, the introduction of SYCL support makes the C++ extension API of PyTorch more powerful, and XPU devices now also support the A16W4 mode. In addition, the development team provided a stable interface for libtorch ABI, reducing compatibility issues in third-party C++/CUDA extensions.

Support for ROCm has also been enhanced, with added support for the gfx950 architecture, and combined with TorchInductor and AOTInductor, it provides automatic tuning templates for multiple kernels. In addition, the introduction of control flow operations, such as conditional statements and loops, makes model compilation and export more efficient.

The release of PyTorch 2.8 undoubtedly brings more possibilities to the field of machine learning, and provides developers with more powerful tools, promoting the application and development of large language models.

Download address: https://github.com/pytorch/pytorch/releases/tag/v2.8.0

Hackers Upload Malicious AI Models on HuggingFace Using 'Corrupted' Pickle Files

Recently, cybersecurity researchers discovered two malicious machine learning models that were uploaded quietly to the renowned machine learning platform HuggingFace. These models utilized a novel technique, successfully bypassing security detection through 'corrupted' pickle files, raising concerns. Karlo Zanki, a researcher at ReversingLabs, pointed out that the beginning of the pickle files extracted from these PyTorch format archives suggests that

Lightning AI Secures $50 Million in Funding, PyTorch Lightning Hits 160 Million Downloads

New York-based artificial intelligence company Lightning AI has recently completed a new funding round of $50 million. The investment round includes notable firms such as Cisco Investments, JPMorgan Chase, K5 Global, and technology giant NVIDIA. To date, Lightning AI's total funding has reached $103 million. Led by founder and CEO William Falcon, Lightning AI is known for its deep learning framework, PyTorch Lightning.

PyTorch Team Successfully Boosts Llama7B Inference Speed by 10x

The PyTorch team has achieved a 10x increase in Llama7B's inference speed, reaching 244.7 tok/s, through optimization techniques in less than 1000 lines of native PyTorch code. The optimization methods include using the torch.compile function from PyTorch 2.0, GPU quantization, speculative decoding, tensor parallelism, as well as using weight quantization in different precisions like int8 and int4. By combining these techniques, including 'compi

Accelerating Generative AI Models with PyTorch

The PyTorch team released a blog post titled 'Accelerating Generative AI with PyTorch II: GPT, Fast', detailing how to speed up generative AI models using native PyTorch. By utilizing Torch.compile and static KV caching, CPU overhead is reduced, achieving nearly a 10x increase in model speed. Employing INT8 quantization alleviates memory bandwidth bottlenecks, leading to further significant performance improvements. Speculative decoding is used to break serial dependencies, enabling...

PyTorch Team Successfully Optimizes Meta Model, Achieving 8x Speedup While Maintaining Accuracy

The PyTorch team successfully rewrote Meta's SAM model, achieving an 8x speedup while maintaining accuracy. The optimization methods included applications of various PyTorch features like Bfloat16, GPU synchronization optimization, and Torch.compile. The article provides an in-depth analysis of SAM model performance, bottleneck resolution, and optimization techniques using new features such as SDPA technology. The rewrite of the SAM model addressed the matrix multiplication bottleneck through methods like pruning, leading to significant performance improvement.

PyTorch 2.1 Released: 35x Speedup in Compilation, ExecuTorch Enables AI Deployment on Mobile Devices

PyTorch 2.1 has been released with a 35x speedup in compilation and support for tracing NumPy code. ExecuTorch enables AI deployment on mobile devices, extending to smartphones, VR headsets, and more. The new version provides automatic dynamic shape support, multiple performance improvements, and quantization support. NumPy code can be compiled to C++/CUDA via torch.compile, achieving a 35x acceleration on MacBook. Python execution efficiency has improved, even as Python usage increases, with execution speeds several times faster.

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

AI Brand Monitoring Tool

AI Search Visibility Checker

GEO Services

AI Model Compatibility Checker

AI Deployment Calculator

PyTorch 2.8 Launches with a Major Boost: Quantum LLM Inference Performance Surges, Intel GPU Support Arrives!

AIbase基地

This article is from AIbase Daily

AI News Recommendations

AI Generated Optimization Metal Kernel PyTorch Inference Speed Increases by an Amazing 87%

Horace He, a leading figure in PyTorch, departs Meta to join a startup founded by OpenAI's former CTO

Netflix Recruits Machine Learning Scientists and Engineers to Drive Content Intelligence

Hackers Upload Malicious AI Models on HuggingFace Using 'Corrupted' Pickle Files

Lightning AI Secures $50 Million in Funding, PyTorch Lightning Hits 160 Million Downloads

PyTorch Team Successfully Boosts Llama7B Inference Speed by 10x

Accelerating Generative AI Models with PyTorch

PyTorch Team Successfully Optimizes Meta Model, Achieving 8x Speedup While Maintaining Accuracy

Huawei Joins PyTorch Foundation as Primer Member: Promoting AI Development

PyTorch 2.1 Released: 35x Speedup in Compilation, ExecuTorch Enables AI Deployment on Mobile Devices

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

AI Brand Monitoring Tool

AI Search Visibility Checker

GEO Services​

AI Model Compatibility Checker

AI Deployment Calculator

PyTorch 2.8 Launches with a Major Boost: Quantum LLM Inference Performance Surges, Intel GPU Support Arrives!

AIbase基地

This article is from AIbase Daily

AI News Recommendations

AI Generated Optimization Metal Kernel PyTorch Inference Speed Increases by an Amazing 87%

Horace He, a leading figure in PyTorch, departs Meta to join a startup founded by OpenAI's former CTO

Netflix Recruits Machine Learning Scientists and Engineers to Drive Content Intelligence

Hackers Upload Malicious AI Models on HuggingFace Using 'Corrupted' Pickle Files

Lightning AI Secures $50 Million in Funding, PyTorch Lightning Hits 160 Million Downloads

PyTorch Team Successfully Boosts Llama7B Inference Speed by 10x

Accelerating Generative AI Models with PyTorch

PyTorch Team Successfully Optimizes Meta Model, Achieving 8x Speedup While Maintaining Accuracy

Huawei Joins PyTorch Foundation as Primer Member: Promoting AI Development

PyTorch 2.1 Released: 35x Speedup in Compilation, ExecuTorch Enables AI Deployment on Mobile Devices

GEO Services