I really like this pattern and use it often, this 'not showing my cards'. The second I hint towards the LLM what I prefer it will become sycophantic and invent nonsense why my preferred solution is better.
I'm sure there's an interesting study on how users 'leak' their preference unintentionally to the LLM; perhaps when users list their options, they often put their prefered option first; but not showing the cards on my hand has been very useful when thinking through a problem with LLMs.
Same. Alternatively (or in addition), I sometimes present my preferred idea as being a "bad/naive/stupid option" (or a suggestion from someone who can't be trusted) to see how it stands up to sycophancy to it being bad. As expected the LLM will usually say "yeah it's bad!" and give plausible-sounding reasons for it, but if these reasons are nonsensical it's a good sign that I'm not missing anything
There's an easy workaround that helps instead of listing options, just describe the problem constraints and ask it to propose approaches independently.
LLMs are very prone to priming in my experience. That is the human psychology name for what you are describing; whether it should be applied to LLMs I don't know, but it describes the phenomenon perfectly.
It's not limited to arguing with LLMs but if you want a honest opinion you should remember to push back even when it agrees with your hidden preference at first. Sometimes it is only being contrarian or supporting the underdog. Steelman the opposition.
LLMs flip positions when users push back ~70% of the time even when they were right. RLHF optimizes for approval, not correctness