logoalt Hacker News

meroesyesterday at 10:06 PM2 repliesview on HN

What is this supposed to show exactly? Those books have been feed into LLMs for years and there's even likely specific RLHF's on extracting spells from HP.


Replies

muzaniyesterday at 10:33 PM

There was a time when I put the EA-Nasir text into base64 and asked AI to convert it. Remarkably it identified the correct text but pulled the most popular translation of the text than the one I gave it.

show 1 reply
rvzyesterday at 10:24 PM

> What is this supposed to show exactly?

Nothing.

You can be sure that this was already known in the training data of PDFs, books and websites that Anthropic scraped to train Claude on; hence 'documented'. This is why tests like what the OP just did is meaningless.

Such "benchmarks" are performative to VCs and they do not ask why isn't the research and testing itself done independently but is almost always done by their own in-house researchers.