Opus 4.6 is a very good model but harness around it is good too. It can talk about sensitive subject...

renewiltord • yesterday at 4:00 AM • 2 replies • view on HN

Opus 4.6 is a very good model but harness around it is good too. It can talk about sensitive subjects without getting guardrail-whacked.

This is much more reliable than ChatGPT guardrail which has a random element with same prompt. Perhaps leakage from improperly cleared context from other request in queue or maybe A/B test on guardrail but I have sometimes had it trigger on innocuous request like GDP retrieval and summary with bucketing.

Replies

menzoic • yesterday at 4:23 AM

I would think it’s due to the non determinism. Leaking context would be an unacceptable flaw since many users rely on the same instance.

A/B test is plausible but unlikely since that is typically for testing user behavior. For testing model output you can do that with offline evaluations.

➕ show 1 reply

tbossanova • yesterday at 4:18 AM

What kind of value do you get from talking to it about “sensitive” subjects? Speaking as someone who doesn’t use AI, so I don’t really understand what kind of conversation you’re talking about

➕ show 4 replies

alt Hacker News

Replies