A benchmark check on spatial biology shows newer frontier models running faster without becoming more reliable. The takeaway for builders is that domain-specific training and analysis patterns still matter more than raw general reasoning gains.
GPT-5.5 nearly halves runtime on SpatialBench relative to GPT-5.4, but its accuracy remains about the same. Opus 4.7 is similarly tied with Opus 4.6. Improvements and spatial biology are unlikely to come from general reasoning gains alone. It will likely require explicit training on statistical design, platform-specific analysis stems, replicate-aware differential testing, and other spatial biology knowledge.