logoalt Hacker News

cornholioyesterday at 10:28 PM2 repliesview on HN

If I understood correctly, the model will get it right because it knows when it isn't right.


Replies

zambelliyesterday at 10:30 PM

Essentially, yes that's right! There's some subtlety in how to let it know it was wrong (returning things as tool errors because it trained on that), but that's the gist of it - sort of a self-correcting architecture.