logoalt Hacker News

eek2121yesterday at 10:58 PM1 replyview on HN

Honestly? My advice would be to cook something custom up! You don't need to do all the text yourself. Maybe have AI spew out a bunch of text, or take obscure existing text and insert hidden phrases here or there.

Shoot, I'd even go so far as to write a script that takes in a bunch of text, reorganizes sentences, and outputs them in a random order with the secrets. Kind of like a "Where's Waldo?", but for text

Just a few casual thoughts.

I'm actually thinking about coming up with some interesting coding exercises that I can run across all models. I know we already have benchmarks, however some of the recent work I've done has really shown huge weak points in every model I've run them on.


Replies

clhodappyesterday at 11:43 PM

Having AI spew it might suffer from the fact that the spew itself is influenced by AI's weights. I think your best bet would be to use a new human-authored work that was released after the model's context cutoff.