And then I see this in the ExecPlan for my latest refactor:
---
# Observations
- Observation: The refactor made the screenshots pixel-identical after the baseline was recaptured correctly.
Evidence: sha256sum screenshots/before-implementation-x.png screenshots/after-implementation-x.png reported matching hashes for before/after pairs 1, 2, and 3.
---
Which is crazy! I've never told Codex to do an sha compare on before/after screenshots of the app, but I do have instructions in my PLANS.md to take before & after screenshots of the webapp for the game to make sure we avoid frontend regressions (it uses GPT-Image-2 for analysis). So for non-frontend impacting changes, of course nothing should be different between screenshots taken at identical timestamps into the game start.
But doing an explicit SHA compare - that's just...not something I would've ever thought of. Wild.