The "dieting and muscle building" of large models landing on the edge has made a major breakthrough. Tencent Hunyuan officially released the ultra-small model HY-1.8B-2Bit today, targeting consumer-grade hardware. This model uses the first industry-level 2Bit quantization solution to reduce the equivalent parameter count to 0.3B, with a memory usage of about 600MB, and its size is even smaller than some commonly used mobile applications.

QQ20260210-135622.png

Technical Breakthrough: The "Impossible Task" of 2Bit Quantization

In model deployment, the lower the quantization bits, the greater the precision loss usually is. To solve this problem, the Tencent Hunyuan team abandoned the traditional PTQ (post-training quantization) strategy and instead adopted quantization-aware training (QAT), combined with data optimization, elastic stretching quantization, and strategy innovation.

Experimental data shows that HY-1.8B-2Bit performs comparably to a 4Bit PTQ model version in core metrics such as mathematics, code, and science. This means that while significantly compressing the size, the model still maintains strong "all-around capabilities."

QQ20260210-135630.png

QQ20260210-135635.png

Performance: Double the Generation Speed, Compatible with Various Hardware

Thanks to extreme compression, the generation speed of this model on real edge devices has improved by 2–3 times compared to the original precision model. The specific performance is as follows:

  • MacBook M4: Within 1024 input, the first character delay is accelerated by 3–8 times, and the generation speed remains more than 2 times stable improvement.

  • Tianji 9500: Compared to Q4 format, the first character delay is accelerated by 1.5–2 times, and the generation speed is accelerated by approximately 1.5 times.

  • Full Thinking Ability: It retains the long and short thinking chain capabilities of Hunyuan-1.8B-Instruct, allowing users to switch flexibly according to the complexity of the task.

Future Layout

Currently, the model provides GGUF-int2 format weights and has been adapted on the Arm SME2 technology platform, making it widely applicable to scenarios such as phones, earphones, and smart homes where offline deployment and privacy are highly required. Tencent Hunyuan stated that in the future, it will further narrow the capability gap between low-bit models and full-precision models through reinforcement learning and model distillation.