Auto-Rubric as Reward: From Implicit Preferences to Explicit Multimodal Generative Criteria

arXiv:2605.08354v1 Announce Type: new Abstract: Aligning multimodal generative models with human preferences demands reward signals that respect the compositional, multi-dimensional structure of human judgment. Prevailing RLHF approaches reduce this structure to scalar or pairwise labels, collapsing nuanced preferences into opaque parametric proxies and exposing vulnerabilities to reward hacking. While recent Rubrics-as-Reward (RaR) methods attempt to recover this structure through explicit…

cs.AI updates on arXiv.org · May 12 · 1 min read · score 7.0

From the source