logoalt Hacker News

bostiktoday at 4:52 AM5 repliesview on HN

They need to walk back a lot more.

Unilaterally revoking zero-data retention, even for enterprise contracts that explicitly require that? Nope.

Fable is utterly unusable for any kind of security work. I tripped the safeguards yesterday - using Fable to dig into a complex (& annoying) security bug that has so far resisted both human and Opus 4.8 level investigation. "Sorry Dave, I can't let you do that."

For the time being we are requesting Anthropic disable Fable for our enterprise and turn ZDR back on. The two may be interlinked so that one will always get neither or both. ZDR is a contractual obligation. Fable in its current form is useless. Might as well flip the old behaviour on and avoid burning money for no reason while this mess is being sorted out.


Replies

rmasttoday at 6:33 AM

I was using it to craft a CTF challenge for summer students involving a simulated mechanical dial safe, but with the fence replaced by a IR beam break sensor and a microcontroller handling the check + flag message display.

For generating the initial 3D simulated safe using three.js it worked well, but then modifications to print a flag tripped the safeguards; eventually got it narrowed down the part in the prompt about it being for a CTF for students, and the "thinking" for the model seems to drift to ideas of encryption/obfuscation of the safe combo so students can't just read out the answer... which makes sense logically to help force students into turning the simulated dial instead. But whatever detection Anthropic I guess just naively sees the model thinking about "encryption" and "obfuscation" without taking into account any of the context.

For writing the dummy firmware, it tripped the safeguards while thinking about how to track dial position in the firmware and output the message; however, when I left out talk about safes and just told it to write firmware for a microcontroller hooked up to an i2c display for showing a message with a beam break sensor to determine the message, and an unspecified i2c chip for getting an unspecified number (e.g. internal wheel positions) it worked fine.

An unrelated software task I asked it to write some code to translate CustomActions in a Windows MSI installer into human readable stuff, which has (exclusively?) defensive security applications for recognizing malicious behavior in an MSI installer. Maybe I'm going crazy, but I'm guessing as part of its research into MSI installer custom actions Fable found articles about analyzing malicious MSI installers, and that probably tripped the safeguards.

Overall my impression is that the safeguards are perhaps using an overzealous and naive implementation that just looks for a list of banned words in the prompt or the thinking -- which drives me crazy when the model says my prompt looks fine, and then 10 minutes in some part of the thinking trips the safeguard.

dmurraytoday at 6:21 AM

The announcement I saw was that your enterprise would have to turn off ZDR to get Fable, not that users could accidentally opt out of ZDR by selecting the wrong model.

Unilaterally disabling ZDR seems like a step too far in the enterprise market, even for a company trying to figure out what its users will let it get away with.

show 1 reply
rurbantoday at 5:47 AM

Not just security work. Normal bug finding was impossible, because the model suddenly called triaging and verifying a possible fix a cyber security threat.

show 1 reply
lII1lIlI11lltoday at 10:24 AM

I think the main reason reason why they mandated data retention for Fable is to fight distillation, not to prevent black hats from using the model.

gmerctoday at 6:22 AM

They want to keep the logs so they can see what other companies do with AI in their area of frontier.