My experience with Chatbots outside of a coding context also ends up like this.
A while ago I asked:
Is "Read more" an appropriate project for the Getting things done framework? - The answer, yes, it was.
Then I asked "Is Read More too big of a project to be appropriate for the GTD Framework" - The answer? Yes, it was far too big.
Answering questions in the positive is a simple kind of bias that basically all LLMs have. Frankly if you are going to train on human data you will see this bias because its everywhere.
LLMs have another related bias though, which is a bit more subtle and easy to trip up on, which is that if you give options A or B, and then reorder it so it is B or A, the result may change. And I don't mean change randomly the distribution of the outcomes will likely change significantly.