mamba-slm-hybrid-optimizer
PublicSmall Language Model Implementation based on Mamba (SSM) architecture. Muon optimizer to the 2D weight matrices while using the stable AdamW optimizer for all other parameters.
Comprehensive AI Models Collection for All Your Development & Research Needs
AI LLM Power Rankings - Performance, Buzz & Trends
Discover Trusted AI Model Partners - Guaranteed Reliable Support
Submit Your Model Info & Services - Precision Marketing & User Targeting
Discover Popular AI-MCP Services - Find Your Perfect Match Instantly
Easy MCP Client Integration - Access Powerful AI Capabilities
Master MCP Usage - From Beginner to Expert
Top MCP Service Performance Rankings - Find Your Best Choice
Publish & Promote Your MCP Services
Large-scale datasets and benchmarks for training, evaluating, and testing models to measure
Comprehensive Text Extraction and Document Processing Solutions for Users
Small Language Model Implementation based on Mamba (SSM) architecture. Muon optimizer to the 2D weight matrices while using the stable AdamW optimizer for all other parameters.