This paper compares common test-time scaling strategies for language models through a compute-efficiency lens, including self-consistency, self-refinement, multi-agent debate, and mixture-of-agents. It is useful for builders trying to trade off accuracy gains against inference cost in agentic systems.
arXiv:2605.01566v1 Announce Type: new Abstract: Advances in inference methods have enabled language models to improve their predictions without additional training. These methods often prioritize raw performance over cost-effective compute usage. However, computational efficiency is key for real-world applications with resource constraints. We provide a systematic analysis of the inference scaling strategies self-consistency, self-refinement, multi-agent debate, and mixture-of-agents, to study…