This is a quantized version of the Microsoft UserLM-8b model, using the imatrix quantization technology of llama.cpp. It can significantly reduce memory usage and improve inference speed while maintaining model performance. It supports multiple quantization levels, from high quality to extreme compression, suitable for different hardware environments.
Natural Language Processing
Gguf