A python package for applying LLM with LangChain and Hugging Face on local CUDA/MPS host
A high-throughput and memory-efficient inference and serving engine for LLMs
Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.
SGLang is a fast serving framework for large language models and vision language models.
Workflow Engine for Kubernetes
kaldi-asr/kaldi is the official location of the Kaldi project.
Open3D: A Modern Library for 3D Data Processing
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in performant way.
NumPy aware dynamic Python compiler using LLVM
NumPy & SciPy for GPU
OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.