SmolLM3-3B-INT8-INT4 is a quantized version based on the HuggingFaceTB/SmolLM3-3B model. It uses torchao to implement 8-bit embedding, 8-bit dynamic activation, and 4-bit weight linear quantization. The model is converted to the ExecuTorch format and achieves high performance on the CPU backend through optimization, making it particularly suitable for mobile device deployment.
Natural Language Processing
Pytorch