Show HN: Agent-skills-eval – Test whether Agent Skills improve outputs

28 points • by darkrishabh • today at 6:12 AM • 6 comments • view on HN

The example model in the documentation is 4o-mini, you might want to update that to a more recent model.

As an aside, 4o-mini came out months before agent skills were released… I’m curious how it performs with choosing to load skills in the first place?

➕ show 2 replies

egeozcan • today at 9:20 AM

Are there any published results gathered using this?

ianhxu • today at 10:08 AM

How do you iterate on the judge prompt? Is there an auto rater?

bixxie09 • today at 11:02 AM

[dead]

huflungdung • today at 8:10 AM

[dead]

alt Hacker News