DeepMind’s ProEval is a new evaluation framework for generative AI that uses surrogate models and transfer learning to cut evaluation costs while surfacing failure modes. It should be useful for teams running large benchmark suites or iterating on agent and model behavior.

ProEval is a framework that reduces generative AI evaluation costs while identifying failure modes using surrogate models and transfer learning across benchmarks.