Posts by Collection

publications

[ICDCS 2023] Distributed Pruning Towards Tiny Neural Networks in Federated Learning.

Published:

This paper proposes FedTiny, a distributed pruning framework for federated learning that generates specialized tiny models for memory- and computing-constrained devices. FedTiny achieves top-one accuracy of 85.23% with the 0.014× FLOPs and 0.03× memory footprint of ResNet18, which outperforms the best baseline, which gets 82.62% accuracy with 0.34× FLOPs and 0.51× memory footprint.

Paper | Code

[CVPR 2024] FedMef: Towards Memory-efficient Federated Dynamic Pruning

Published:

This paper proposes FedMef, a novel and memoryefficient federated dynamic pruning framework. FedMef comprises two key components: 1. Budget-aware Extrusion that maintains pruning efficiency 2. Scaled Activation Pruning effectively reduce activation memory footprints. FedMef significantly reduces the memory footprint of MobileNetV2 by 28.5% while improving the accuracy by more than 2%.

Paper | Code

[ACL 2025] Quaff: Quantized Parameter-Efficient Fine-Tuning under Outlier Spatial Stability Hypothesis

Published:

This paper propose the Outlier Spatial Stability Hypothesis (OSSH): During fine-tuning, certain activation outlier channels retain stable spatial positions across training iterations. Building on OSSH, we propose Quaff, a Quantized parameter-efficient finetuning framework for LLMs, optimizing lowprecision activation representations through targeted momentum scaling. On the GPQA reasoning benchmark, Quaff achieves a 1.73× latency reduction and 30% memory savings over full-precision fine-tuning while improving accuracy by 0.6% on the Phi-3 model, reconciling the triple trade-off between efficiency.

Paper | Code

[Preprint] Tequila: Trapping-free Ternary Quantization for Large Language Models

Published:

This paper proposes Tequila, a trapping-free Ternary quantization method for large language models. The key idea in Tequila is to reactivate dead weights by repurposing them as dynamic biases. Tequila achieves a > 4% accuracy gain over the SOTA baseline on the ARC benchmark, nearly matching full-precision performance (within < 1% gap). Furthermore, it delivers a significant 3× inference speedup on an Intel 8263C CPU, verifying that Tequila fully preserves the hardware efficiency of ternary quantization.

Paper | Code

[NeurIPS 2025] FedRTS: Federated Robust Pruning via Combinatorial Thompson Sampling

Published:

This paper propose Federated Robust pruning via combinatorial Thompson Sampling (FedRTS), a novel framework designed to develop robust sparse models. FedRTS enhances robustness and performance through its Thompson Sampling-based Adjustment (TSAdj) mechanism, which uses probabilistic decisions informed by stable, farsighted information instead of deterministic decisions reliant on unstable and myopic information in previous methods. On the CIFAR-10 dataset with the ResNet18 model, FedRTS achieves either a 5.1% accuracy improvement or a 33.3% reduction in communication costs compared to SOTA frameworks.

Paper | Code

teaching