Negative Before Positive: Asymmetric Valence Processing in Large Language Models

arXiv:2605.05653v1 Announce Type: new Abstract: Mechanistic interpretability has revealed how concepts are encoded in large language models (LLMs), but emotional content remains poorly understood at the mechanistic level. We study whether LLMs process emotional valence through dedicated internal structure or through surface token matching. Using activation patching and steering on open-source LLMs, we find that negative and positive valence are processed at different network depths. Negative…

cs.CL updates on arXiv.org · May 8 · 1 min read · score 7.0

From the source