marlin
PublicFP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
Discover Popular AI-MCP Services - Find Your Perfect Match Instantly
Easy MCP Client Integration - Access Powerful AI Capabilities
Master MCP Usage - From Beginner to Expert
Top MCP Service Performance Rankings - Find Your Best Choice
Publish & Promote Your MCP Services
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.