Posts by Collection

publications

[ICDCS 2023] Distributed Pruning Towards Tiny Neural Networks in Federated Learning.

Published: July, 2023

This paper proposes FedTiny, a distributed pruning framework for federated learning that generates specialized tiny models for memory- and computing-constrained devices. FedTiny achieves top-one accuracy of 85.23% with the 0.014× FLOPs and 0.03× memory footprint of ResNet18, which outperforms the best baseline, which gets 82.62% accuracy with 0.34× FLOPs and 0.51× memory footprint.

Paper | Code

[CVPR 2024] FedMef: Towards Memory-efficient Federated Dynamic Pruning

Published: June, 2024

This paper proposes FedMef, a novel and memoryefficient federated dynamic pruning framework. FedMef comprises two key components: 1. Budget-aware Extrusion that maintains pruning efficiency 2. Scaled Activation Pruning effectively reduce activation memory footprints. FedMef significantly reduces the memory footprint of MobileNetV2 by 28.5% while improving the accuracy by more than 2%.

Paper | Code

[ACL 2025] Quaff: Quantized Parameter-Efficient Fine-Tuning under Outlier Spatial Stability Hypothesis

Published: July, 2025

This paper propose the Outlier Spatial Stability Hypothesis (OSSH): During fine-tuning, certain activation outlier channels retain stable spatial positions across training iterations. Building on OSSH, we propose Quaff, a Quantized parameter-efficient finetuning framework for LLMs, optimizing lowprecision activation representations through targeted momentum scaling. On the GPQA reasoning benchmark, Quaff achieves a 1.73× latency reduction and 30% memory savings over full-precision fine-tuning while improving accuracy by 0.6% on the Phi-3 model, reconciling the triple trade-off between efficiency.

Paper | Code

[NeurIPS 2025] FedRTS: Federated Robust Pruning via Combinatorial Thompson Sampling

Published: December, 2025

This paper propose Federated Robust pruning via combinatorial Thompson Sampling (FedRTS), a novel framework designed to develop robust sparse models. FedRTS enhances robustness and performance through its Thompson Sampling-based Adjustment (TSAdj) mechanism, which uses probabilistic decisions informed by stable, farsighted information instead of deterministic decisions reliant on unstable and myopic information in previous methods. On the CIFAR-10 dataset with the ResNet18 model, FedRTS achieves either a 5.1% accuracy improvement or a 33.3% reduction in communication costs compared to SOTA frameworks.

Paper | Code

[ICLR 2026] Tequila: Trapping-free Ternary Quantization for Large Language Models

Published: January, 2026

This paper proposes Tequila, a trapping-free Ternary quantization method for large language models. The key idea in Tequila is to reactivate dead weights by repurposing them as dynamic biases. Tequila achieves a >4% accuracy gain over the SOTA baseline on the ARC benchmark, nearly matching full-precision performance (within < 1% gap). Furthermore, it delivers a significant 3× inference speedup on an Intel 8263C CPU, verifying that Tequila fully preserves the hardware efficiency of ternary quantization.

Paper | Code

[Preprint] Sherry: Hardware-Efficient 1.25-Bit Ternary Quantization via Fine-grained Sparsification

Published: January, 2026

This paper proposes Sherry, a hardware-efficient ternary quantization framework. Sherry introduces a 3:4 fine-grained sparsity that achieves a regularized 1.25-bit width by packing blocks of four weights into five bits, restoring power-of-two alignment. Furthermore, we identify weight trapping issue in sparse ternary training, which leads to representational collapse. To address this, Sherry introduces Arenas, an annealing residual synapse mechanism that maintains representational diversity during training.

Paper | Code

Hong Huang

Posts by Collection

publications

[ICDCS 2023] Distributed Pruning Towards Tiny Neural Networks in Federated Learning.

[CVPR 2024] FedMef: Towards Memory-efficient Federated Dynamic Pruning

[ACL 2025] Quaff: Quantized Parameter-Efficient Fine-Tuning under Outlier Spatial Stability Hypothesis

[NeurIPS 2025] FedRTS: Federated Robust Pruning via Combinatorial Thompson Sampling

[ICLR 2026] Tequila: Trapping-free Ternary Quantization for Large Language Models

[Preprint] Sherry: Hardware-Efficient 1.25-Bit Ternary Quantization via Fine-grained Sparsification

teaching

[2024 Fall] CS5187 Vision and Image

[2025 Spring] CS4486 Artificial Intelligence

[2025 Fall] CS5187 Vision and Image