Beyond Steering Vector: Flow-based Activation Steering for Inference-Time Intervention

arXiv:2605.05892v1 Announce Type: new Abstract: Activation steering has emerged as a promising alternative for controlling language-model behavior at inference time by modifying intermediate representations while keeping model parameters frozen. However, large-scale evaluations such as AxBench show that existing steering methods are often outperformed by simple in-context prompting and generalize poorly to unseen concepts. We hypothesize that these limitations arise from unvalidated simplifying…

cs.CL updates on arXiv.org · May 8 · 1 min read · score 7.0

From the source