MobileLLM-ParetoQ is an extremely low-bit large language model quantization framework optimized for mobile devices, supporting 1-bit, 1.58-bit, 2-bit, 3-bit, and 4-bit quantization settings, significantly reducing resource consumption while maintaining high performance.
Natural Language Processing
TransformersEnglish