logoalt Hacker News

jakozauryesterday at 4:17 PM1 replyview on HN

So many models refuse to do that due to alignment and safety concerns. So cross-model comparison doesn't make sense. We do, however, require proof (such as providing a location in binary) that is hard to game. So the model not only has to say there is a backdoor, but also point out the location.

Your approach, however, makes a lot of sense if you are ready to have your own custom or fine-tuned model.


Replies

simianwordsyesterday at 4:25 PM

Surprising that they still allow to catch the back doors but not use them.

A bad actor already has most of the work done.

show 1 reply