Google is advancing the "TorchTPU" initiative, aimed at improving the compatibility of its TPU chips with the PyTorch framework, to reduce the cost for developers migrating from NVIDIA GPUs to Google's TPU. This move is intended to challenge NVIDIA's dominance in the AI chip market and break the deep integration between PyTorch and NVIDIA CUDA.
AI enhances Apple device performance by optimizing Metal kernels, boosting PyTorch inference by 87% (avg 1.87x), with some workloads improving hundreds of times. Tested on 215 modules using 8 top AI models.....
PyTorch 2.8 enhances Intel CPU LLM inference with 20% lower latency, adds Intel GPU support, and improves SYCL/ROCm compatibility.....
No description available
Flux is a fast communication overlap library for tensor/expert parallelism on GPUs.
Analyzes the computation and communication overlap strategies in V3/R1, providing performance analysis data for deep learning frameworks.
A music, song, and audio generation toolkit based on PyTorch that supports high-quality audio generation.
A pre-trained time series forecasting model developed by Google Research.
pytorch
This is the Qwen3-8B model quantized by the PyTorch team using torchao, adopting int4 weight-only quantization and the AWQ algorithm. This model can reduce 53% of GPU memory usage and achieve 1.34x acceleration on the H100 GPU. It is specifically calibrated and optimized for the mmlu_abstract_algebra task.
This is the FP8 quantized version of the Gemma-3-27B model developed by the PyTorch team. It is based on the google/gemma-3-27b-it model with FP8 quantization processing. This model supports efficient inference through both vLLM and Transformers, significantly reducing memory usage and improving inference speed while maintaining model quality.
minpeter
This is a training model built based on the 🤗 Transformers library, specifically designed to detect errors in the Muon implementation of kozistr/pytorch_optimizer. The model can identify and locate potential issues in the optimizer implementation, helping developers improve code quality.
FlameF0X
SnowflakeCore-G1-Tiny2 is a custom Transformer language model based on the GPT style and is an improved version of SnowflakeCore-G1-Tiny. This model is built from scratch using PyTorch and trained on the common-pile/wikimedia_filtered dataset. It has approximately 400 million parameters, supports a 2048-token context window, and is specifically designed for text generation tasks.
SmolLM3-3B-INT8-INT4 is a quantized version based on the HuggingFaceTB/SmolLM3-3B model. It uses torchao to implement 8-bit embedding, 8-bit dynamic activation, and 4-bit weight linear quantization. The model is converted to the ExecuTorch format and achieves high performance on the CPU backend through optimization, making it particularly suitable for mobile device deployment.
unsloth
KernelLLM is a large language model specifically trained based on Llama 3.1 Instruct, focusing on writing GPU kernels using Triton. It can efficiently convert PyTorch modules into Triton kernels, making GPU programming more accessible and efficient.
Tournesol-Saturday
PyTorch-based CBCT image tooth segmentation model using region-aware guided learning for semi-supervised segmentation
sicto
The SICTO Vocal Separator is a high-quality vocal separation model developed based on the PyTorch framework, specifically designed to extract clear vocal parts from music audio. This model is trained on the musdb18hq dataset and can provide professional-level vocal separation effects for music production and audio editing.
ABDALLALSWAITI
This is the FP8 quantized version of the Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro-2.0 model, quantized from the original BFloat16 format using PyTorch's native FP8 support to optimize inference performance.
castiello
PyTorch-based FPN image segmentation model supporting multiple encoder architectures, suitable for semantic segmentation tasks
therarelab
A PyTorch-based action recognition model for robotics applications
facebook
An 8B-parameter large language model based on Llama 3.1 Instruct, specifically trained for writing GPU kernels using Triton, capable of converting PyTorch modules to Triton kernels
jclinton1
Diffusion Policy is a robot control model based on diffusion strategy, implemented with PyTorch and integrated into Hugging Face's model hub.
A pretrained language model based on the PyTorch framework released by Meta, suitable for non-commercial research purposes.
Pre-trained language model based on PyTorch released by Meta, suitable for non-commercial research purposes
Meta's PyTorch-based pre-trained language model, compliant with FAIR Non-commercial Research License
waleko
This model is a PyTorch-based image-to-image transformation model, integrated and pushed to the Hub via PytorchModelHubMixin.
Matiullah2401592
PyTorch-based DeepLabV3Plus image segmentation model supporting efficient semantic segmentation tasks
PyTorch-based DeepLabV3Plus image segmentation model supporting multiple encoder architectures
Diamantis99
PyTorch-based Unet image segmentation model supporting various encoder architectures and pre-trained weights
PyTorch CI/CD data analysis tool library and MCP service
A command-line tool prototype for semantic search of PyTorch documentation. Currently suspended due to design issues
An MCP server that exposes the PyTorch Lightning framework to tools, agents, and orchestration systems through structured APIs, supporting functions such as training, inspection, validation, testing, prediction, and model checkpoint management.