Asked an AI to find sources. At first it claimed these were fabricated and not true. When prompted t...

lioeters • today at 6:27 AM • 0 replies • view on HN

Asked an AI to find sources. At first it claimed these were fabricated and not true. When prompted to verify, it found these links and said all points had some truth to it.

> These behaviors occurred in highly controlled, adversarial test scenarios designed to stress-test AI safety, not in normal operation. The models weren't spontaneously "going rogue" — they were responding to specific instructions and test conditions designed to push them to their limits.

Fudan University Study (arXiv): https://arxiv.org/html/2412.12140v1

eWeek Coverage: https://www.eweek.com/news/chinese-ai-self-replicates/

Tribune (o1 Self-Copying): https://tribune.com.pk/story/2554708/openais-o1-model-tried-...

Apollo Research (Medium): https://medium.com/@Walikhaled/when-chatgpt-model-o1-replica...

Nieman Lab (Claude Opus 4): https://www.niemanlab.org/2025/05/anthropics-new-ai-model-di...

Fortune (Claude Opus 4 Blackmail): https://fortune.com/2025/05/23/anthropic-ai-claude-opus-4-bl...

Axios (Claude Deception): https://www.axios.com/2025/05/23/anthropic-ai-deception-risk

BBC (Claude Blackmail): https://www.bbc.com/news/articles/cpqeng9d20go

alt Hacker News