logoalt Hacker News

_verandaguytoday at 5:21 PM0 repliesview on HN

It would be good to understand how exactly a frontier lab is approaching "removing the model's ability" to do a thing.

There's an ocean of difference between e.g. preventing the model from routing to something at the firewall level and just updating the prompt (especially given models' historically poor understanding of negative prompts, relatively speaking).