This is an FP8 dynamic quantization version based on the Bielik-1.5B-v3.0-Instruct model, adapted for vLLM or SGLang inference frameworks. It uses AutoFP8 quantization technology to reduce parameter bytes from 16-bit to 8-bit, significantly lowering disk space and GPU VRAM requirements.
Natural Language Processing
GgufOther