The separation of harness from compute is the right architectural move. The part that's still missing from most agent frameworks is the verification layer between steps. Sandbox execution solves the safety problem. It doesn't solve the accuracy problem. Those are different failure modes that need different infrastructure.