logoalt Hacker News

okwasniewskiyesterday at 3:30 PM1 replyview on HN

Unfortunately from our experience tests don’t scale as well as code. First of all static tests are very brittle, you rely on selectors, need wait times and can’t really test a lot of dynamic content (think AI chats/interactions). Then it’s all the infrastructure around it: solving captchas, handling auth, handling email OTP (each of our agents has access to its own inbox) and handling video recording and screenshots. So with the traditional testing approach you end up mocking a lot of services. I highly recommend you to give it a try!


Replies

Obertryesterday at 10:06 PM

I would respectfully disagree on this. How i write tests right now I ask claude/codex to create an eval and it just spins up a bg LLM agent worker which verifies the tests in the sandbox/internally.

So i would say that atm in house testing is easier than external testing for us