People love to interpret the results in the most negative way possible because it's a threat to...

threethirtytwo • yesterday at 2:44 PM • 5 replies • view on HN

People love to interpret the results in the most negative way possible because it's a threat to their occupation and identity. I refer to HN specifically.

The fact of the matter is, if you want to edit a document by reading the document and then regurgitating the entire document with said edits... a human will DO worse then a 25% degradation. It's possible for a human to achieve 0% degradation but the human will have to ingest the document hundreds of times to achieve a state called "memorization". The equivalent in an LLM is called training. If you train a document into an LLM you can get parity with the memorized human edit in this case.

But the above is irrelevant. The point is LLMs have certain similarities with humans. You need to design a harness such that an LLM edits a document the same way a human would: Search and surgical edits. All coding agents edit this way, so this paper isn't relevant.

Replies

shahbaby • yesterday at 6:49 PM

> People love to interpret the results in the most negative way possible because it's a threat to their occupation and identity.

OR it could be because their concerns are genuine but are ignored in favour of a good sounding story.

➕ show 2 replies

ActionHank • yesterday at 11:44 PM

> a human will DO worse then a 25% degradation.

* than

➕ show 1 reply

ieieue • yesterday at 3:53 PM

[flagged]

tieTYT • yesterday at 8:07 PM

> a human will DO worse then a 25% degradation

As I was reading this article, a similar thought occurred to me: "I wonder if that's better or worse than a human?" Unfortunately, there was no human baseline in this study. That said, there are studies that compare LLM to human performance. Usually, humans perform much better (like 5-7x better) at long-running tasks.

In other words, a human would probably do better than an LLM on this task.

Humans lose to LLMs in narrow, well-specified text/symbolic reasoning tasks where the model can exploit breadth, speed, and search. Usually, the LLM performed ~15% better than humans, but I saw studies that were as high as 80%. To my surprise, these studies were usually about "soft skills" like creativity and persuasion.

➕ show 1 reply

alt Hacker News

Replies