HN – Show HN: LOAB – AI agents get decisions right but skip the process [pdf]

LOAB, an open-source benchmark for evaluating whether AI agents can follow regulated lending processes — not just produce the right final answer. The motivation is simple: in mortgage lending, regulators don't care if you got the right answer. They care whether you followed the right process. Skip a KYC check, pull a credit bureau report before getting privacy consent, or approve a loan without the required policy lookup — that's a compliance failure even if the outcome was correct. Current AI benchmarks don't measure this. They evaluate what the agent decided, not how it got there. LOAB simulates a fictional Australian lender with mock regulatory APIs, multi-agent roles mirroring real bank operations, and a five-dimension scoring rubric derived from actual lending law. A run only passes if the outcome is correct AND the process was correct. The main finding: frontier models achieve 67-75% outcome accuracy but only 25-42% when you also require process compliance. It's surprisingly hard to get AI to follow a prescribed sequence of steps even when it clearly "knows" the right answer.

Show HN: LOAB – AI agents get decisions right but skip the process [pdf]

0 comments