logoalt Hacker News

esafranchikyesterday at 5:49 PM1 replyview on HN

Two follow-ups:

1) How do you compare accuracy? by checking if the answer is in any of the returned grep/bm25/semble snippets?

2) How do you measure token use without the agent, prompt, and tools?


Replies

stephantulyesterday at 5:51 PM

1) yes! It’s not accuracy, but ndcg 2) we assume that if the agent gets the correct answer in the returned snippets it does not need to read further

show 1 reply