A smart AI would realise that I can MITM its web access such that sees the .well-known token that isn't actually there. I assume that the model doesn't have CA certificates embedded into it, and relies on its harness for that.
In this context we are talking explicitly about cloud-hosted AIs. If you control it locally you have a lot of options to force it to do things.
MITM the cloud AI on the modern internet is non-trivial, and probably harder and less reliable than just talking your way around the guardrails anyhow.
In this context we are talking explicitly about cloud-hosted AIs. If you control it locally you have a lot of options to force it to do things.
MITM the cloud AI on the modern internet is non-trivial, and probably harder and less reliable than just talking your way around the guardrails anyhow.