Seems like benchmark for how good a model is for vibe coding.
Your prompt is extremely slim yet you score it on a bunch of features.
Yes, the prompt is slim by design. I might be wrong, but the point was to see what the model can do "on it's own".
The eval prompt is quite extensive: https://github.com/guilamu/llms-wordpress-plugin-benchmark/b...
Yes, the prompt is slim by design. I might be wrong, but the point was to see what the model can do "on it's own".
The eval prompt is quite extensive: https://github.com/guilamu/llms-wordpress-plugin-benchmark/b...