[Preprint] Sherry: Hardware-Efficient 1.25-Bit Ternary Quantization via Fine-grained Sparsification

Published: January 02, 2026

This paper proposes Sherry, a hardware-efficient ternary quantization framework. Sherry introduces a 3:4 fine-grained sparsity that achieves a regularized 1.25-bit width by packing blocks of four weights into five bits, restoring power-of-two alignment. Furthermore, we identify weight trapping issue in sparse ternary training, which leads to representational collapse. To address this, Sherry introduces Arenas, an annealing residual synapse mechanism that maintains representational diversity during training.

Paper | Code

Share on

Bluesky Facebook LinkedIn Mastodon X (formerly Twitter)

Hong Huang

Share on