logoalt Hacker News

sweetjulytoday at 7:35 AM1 replyview on HN

Is what you suggest about training even possible? Most exploitation techniques are really just about having in-depth knowledge of how components work. For example, I imagine a sufficiently powerful model could fairly easily re-invent the ROP chain from first principles if it just knew how the stack works. This same principle applies to much more complex attack too; exploitation is often just an exercise in knowing vastly too much trivia, which LLMs tend to have in spades.


Replies

_0ffhtoday at 8:58 AM

It would still degrade it's effectiveness, which is what they claim to want. Exaggeratedly: If it wasn't so, you'd just need fundamental math in the training data, as everything else can be derived.