EO-Gym introduces an executable benchmark for Earth-observation agents that must query sensors, expand regions of interest, and handle multimodal uncertainty. It shifts EO evaluation from static image QA toward interactive tool use, which is more representative of real analyst workflows.
arXiv:2605.01250v1 Announce Type: new Abstract: Earth Observation (EO) analysis is inherently interactive: resolving uncertainty often requires expanding the region of interest, retrieving historical observations, and switching across sensors such as optical and Synthetic Aperture Radar. However, most EO benchmarks collapse this process into fixed-input, single-turn tasks. To address this gap, we present EO-Gym, a controlled executable framework for multimodal, tool-using EO agents that…