Obviously there are advantages to not having to do work yourself. But for a benchmark with the goa...

eli • today at 12:52 PM • 0 replies • view on HN

Obviously there are advantages to not having to do work yourself.

But for a benchmark with the goal of picking a model to replace a human on some task? I really think the human should judge which is best.

I haven’t gotten very far yet but I had an idea for a personalized benchmark tool that walks through your git history and helps you craft prompts for tasks that bugs or features already implemented by hand so you can compare how different LLMs would do it.

alt Hacker News