Product Finder

Product Submit

AI Models Finder

MCP Servers

MCP Client

MCP Inspector

Case Tutorials

Latest AI News

AI Daily Brief

PiKV

PiKV: MoE KV Cache Management System [Efficient ML System]

kvcache mlsystem moe parallel-computing

Heure de création：2025-04-07T14:45:12

Heure de mise à jour：2025-06-23T11:55:02

22

Stars

8

Stars Increase

Projets connexes

LLaMA Factory

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Sglang

SGLang is a fast serving framework for large language models and vision language models.

TensorRT LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in performant way.

Bangumi

:electron: An unofficial https://bgm.tv ui first app client for Android and iOS, built with React Native. 一个无广告、以爱好为驱动、不以盈利为目的、专门做 ACG 的类似豆瓣的追番记录，bgm.tv 第三方客户端。为移动端重新设计，内置大量加强的网页端难以实现的功能，且提供了相当的自定义选项。目前已适配 iOS / Android / WSA、mobile / 简单 pad、light / dark theme、移动端网页。

Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

MoE LLaVA

large-vision-language-model

Mixture-of-Experts for Large Vision-Language Models

MoBA

flash-attention

MoBA: Mixture of Block Attention for Long-Context LLMs

R KV

R-KV: Redundancy-aware KV Cache Compression for Reasoning Models

Llama Moe

continual-pre-training

?? LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training (EMNLP 2024)

Tutel

Tutel MoE: An Optimized Mixture-of-Experts Implementation