logoalt Hacker News

esafakyesterday at 2:58 PM1 replyview on HN

What do you mean? It tests whether the model knows the tools and uses them.


Replies

YetAnotherNickyesterday at 3:47 PM

Yeah it's a knowledge benchmark not agentic benchmark.

show 1 reply