logoalt Hacker News

Show HN: Agent-skills-eval – Test whether Agent Skills improve outputs

28 pointsby darkrishabhtoday at 6:12 AM6 commentsview on HN

Comments

ssgodderidgetoday at 10:03 AM

The example model in the documentation is 4o-mini, you might want to update that to a more recent model.

As an aside, 4o-mini came out months before agent skills were released… I’m curious how it performs with choosing to load skills in the first place?

show 2 replies
egeozcantoday at 9:20 AM

Are there any published results gathered using this?

ianhxutoday at 10:08 AM

How do you iterate on the judge prompt? Is there an auto rater?

bixxie09today at 11:02 AM

[dead]

huflungdungtoday at 8:10 AM

[dead]