This is an experimental quantized version of the dense model based on Qwen3-4B-Thinking-2507. It adopts the innovative MXFP4 mixed quantization technology, aiming to explore the combination of weights with different precisions (such as MXFP4, Q8_0, Q6_K, etc.) to significantly reduce the model file size, improve the inference speed (TPS), and at the same time, maintain the precision as close as possible to the original F16 model. This project demonstrates the potential of the mixed quantization method, but it has been replaced by the author's updated version.
Natural Language Processing
SafetensorsOther