I saw on Twitter that in an ML course at Tsinghua University, one of the tests asks students to writ...

magnio • today at 5:05 AM • 3 replies • view on HN

I saw on Twitter that in an ML course at Tsinghua University, one of the tests asks students to write quizzes that fail the most LLM models as possible.

What if we create a benchmark that works like this and assigns ELO scores? Models fight head-to-head by writing a question, a bug, or an incomplete implementation, which the opponent has to answer, fix, or finish.

Replies

vincnetas • today at 5:39 AM

We could call this "generative adversarial network" (GAN) :)

https://en.wikipedia.org/wiki/Generative_adversarial_network

➕ show 1 reply

olmo23 • today at 6:33 AM

How do you prevent degenerate strategies? I could trivially give a model a SHA256 hash and ask it to provide the source input.

In class you'd probably want a rule saying at least one LLM should be able to figure out the answer, but in a head-to-head I'm not sure how to solve it.

➕ show 4 replies

eunos • today at 10:42 AM

That was Fudan I think

alt Hacker News

Replies