GLM-4.5-Iceblink-v2-106B-A12B-FP8 is a version based on the GLM-4.5-Iceblink-v2-106B-A12B model, using the most advanced mixture of experts quantization method for FP8 quantization. This model is specifically optimized for Ada, Hopper, or Blackwell series GPUs that support hardware FP8, significantly improving inference efficiency while maintaining high-quality output.
Natural Language Processing
Safetensors