1) yes! It’s not accuracy, but ndcg 2) we assume that if the agent gets the correct answer in the re...

stephantul • yesterday at 5:51 PM • 1 reply • view on HN

1) yes! It’s not accuracy, but ndcg 2) we assume that if the agent gets the correct answer in the returned snippets it does not need to read further

esafranchik • yesterday at 6:01 PM

Wouldn't NDCG/token results vary wildly depending on the agent's query and the number of returned items?

e.g. agents often run `grep -m 5 "QUERY"` with different queries, instead of one big grep for all items.

➕ show 1 reply

alt Hacker News