logoalt Hacker News

DrProticyesterday at 8:26 PM1 replyview on HN

Seems like benchmark for how good a model is for vibe coding.

Your prompt is extremely slim yet you score it on a bunch of features.


Replies

guilamuyesterday at 8:28 PM

Yes, the prompt is slim by design. I might be wrong, but the point was to see what the model can do "on it's own".

The eval prompt is quite extensive: https://github.com/guilamu/llms-wordpress-plugin-benchmark/b...

show 1 reply