Recently, the open-source machine learning framework PyTorch has officially released its new version 2.8. This release has attracted a lot of attention, mainly focusing on improving the inference performance of quantized large language models (LLMs), especially on Intel CPUs. This update not only significantly enhances inference efficiency in offline mode, but also introduces experimental support for Intel GPU distributed backends for the first time.

In PyTorch 2.8, developers have improved algorithms and introduced new technologies, greatly increasing the inference speed of quantized LLMs. Specifically, this version supports multiple quantization modes, including A16W8, DA8W8, and A16W4. Test data shows that when running the Llama-3.1-8B model with M=8, K, and 32 cores on Intel's 6th generation Xeon platform, end-to-end latency is reduced by more than 20%, and performance can even rival some popular LLM service frameworks.

image.png

Another highlight of this update is the experimental support for XCCL distributed backend on Intel discrete GPUs in PyTorch 2.8. This feature provides more flexibility for different training modes, allowing developers to leverage the potential of models in a wider range of hardware environments.

Aside from these core enhancements, PyTorch 2.8 also includes a series of important improvements. For example, the introduction of SYCL support makes the C++ extension API of PyTorch more powerful, and XPU devices now also support the A16W4 mode. In addition, the development team provided a stable interface for libtorch ABI, reducing compatibility issues in third-party C++/CUDA extensions.

Support for ROCm has also been enhanced, with added support for the gfx950 architecture, and combined with TorchInductor and AOTInductor, it provides automatic tuning templates for multiple kernels. In addition, the introduction of control flow operations, such as conditional statements and loops, makes model compilation and export more efficient.

The release of PyTorch 2.8 undoubtedly brings more possibilities to the field of machine learning, and provides developers with more powerful tools, promoting the application and development of large language models.

Download address: https://github.com/pytorch/pytorch/releases/tag/v2.8.0