Has anyone demonstrated that this type of attack is even possible? Also the moment anyone detects th...

Jtarii • today at 3:49 PM • 1 reply • view on HN

Has anyone demonstrated that this type of attack is even possible? Also the moment anyone detects this attack it will nuke deepseek/other chinese AI labs reputation completely, it is the most high risk low reward attack ever.

Replies

fragmede • today at 4:23 PM

Yes.

https://arxiv.org/abs/2401.05566

In that paper, if it LLM was told it was 2023, then the code it generated was fine. If the prompt included the fact that it was 2024, then it intentionally wrote exploitable code.

alt Hacker News

Replies