IBM Granite-4.0-H-Tiny is a hybrid Mamba-2/Transformer model optimized for Apple silicon chips, using 3-bit quantization technology, and is designed for long context, efficient inference, and enterprise use. This model combines the Mamba-2 architecture with the Mixture of Experts (MoE) technology, significantly reducing memory usage while maintaining expressiveness.
Natural Language Processing MlxMultiple Languages
MlxMultiple Languages