logoalt Hacker News

mvkellast Saturday at 10:43 PM4 repliesview on HN

Weirdly, LLMs seem to break with these instructions. They simply ignore them, almost as if the pretraining/RL weights are so heavy, no amount of system prompting can override it


Replies

RandomWorkerlast Saturday at 10:56 PM

It's a beauty. We can easily detect the issues with Youtubers that generate scripts from this tool. I've noticed these tropes, after 30 seconds, remove, block, and do not recommend any further. I hope to train the algorithm to detect AI scripts and stop recommending me those videos. It's honestly turned me off from YouTube so much, or I find myself going to my "subscribed" tab and going to content creators that still believe in the craft.

show 1 reply
duskwuffyesterday at 1:03 AM

IIRC, it's well documented that negative instructions tend to be ineffective - possibly through some sort of LLM analogue to the "pink elephant paradox", or simply because the language models are unable to recognize clichés until they've already been generated.

show 1 reply
esperentyesterday at 2:16 AM

I assume it'll work more as a review pass rather than expecting good results outright. For all kinds of things like this where I feel like I'm fighting the LLM, doing the initial work then auditing it seems to be the best approach (the other one is writing all kinds of tests, LLMs including Opus 4.6 love to fudge tests just as much as they love telling you how insightful you are).

cainxinthyesterday at 11:35 AM

It amounts to telling it: “Stop doing that thing you can’t stop doing.”