logoalt Hacker News

scratchyoneyesterday at 10:03 PM2 repliesview on HN

For what it's worth, there are modern LLM detectors with extremely low false-positive rates. The tech has advanced quite a bit since the ZeroGPT days. Personally I've gotten very good results from Pangram Labs. Still can't directly ban people though because false positives are always possible.


Replies

diacriticaltoday at 2:05 AM

Are they great at detecting normal prompts that don't try to make the LLM speak non-LLM-ishly? If you make the LLM not use em dashes, "it's not; it's" phrases and similar things, and if you make it make a few mistakes here and there, would it still be detected? My point is that if people aren't trying to hide their LLM use, it might work, otherwise it probably wouldn't. How would a detector tool work against output where the prompt tells the LLM to alter the way it writes? Or if the LLM output is being modified by another LLM specifically designed to mimic certain styles?

Like, why would my comment (or yours, or any other comment) pass or fail the LLM check the I/you/someone else used specific prompts or another LLM to edit the output? It seems like these tools would work on 99.9% of the outputs, but those outputs likely weren't created in an adversarial way.

zahlmanyesterday at 10:09 PM

Is that false-positive rate from your own testing, or the author's claims? What is the source of ground truth?