Gradient-Based LoRA Rank Allocation Under GRPO: An Empirical Study

arXiv:2605.07366v1 Announce Type: new Abstract: Adaptive rank allocation for LoRA, allocating more parameters to important layers and fewer to unimportant ones, consistently improves efficiency under supervised fine-tuning (SFT). We investigate whether this success transfers to reinforcement learning, specifically Group Relative Policy Optimization (GRPO). Using gradient-magnitude profiling on Qwen 2.5 1.5B with GSM8K, we find that it does not: proportional rank allocation degrades accuracy by…

cs.CL updates on arXiv.org · May 12 · 1 min read · score 7.0

From the source