logoalt Hacker News

okwasniewskiyesterday at 11:10 PM0 repliesview on HN

We've been doing quite a lot of context engineering and optimizations to make sure it's not as expensive. The subsequent runs are faster because we cache the trajectory of the agent (not the whole test run yet, as we want to keep the agent in the loop, more like a manual QA engineer, not a test script).

We currently do not have any benchmarks; much of the experience depends on the test plan. We've been mostly focusing on the customer experience not benchmarking.