logoalt Hacker News

l_eoyesterday at 8:55 PM1 replyview on HN

They will start to max this benchmark as well at some point.


Replies

ljmyesterday at 10:25 PM

It's not a benchmark though, right? Because there's no control group or reference.

It's just an experiment on how different models interpret a vague prompt. "Generate an SVG of a pelican riding a bicycle" is loaded with ambiguity. It's practically designed to generate 'interesting' results because the prompt is not specific.

It also happens to be an example of the least practical way to engage with an LLM. It's no more capable of reading your mind than anyone or anything else.

I argue that, in the service of AI, there is a lot of flexibility being created around the scientific method.

show 2 replies