>We evaluated 11 state-of-the-art AI-based LLMs, including proprietary models such as OpenAI’s GPT-4o
The study explores outdated models, GPT-4o was notoriously sycophantic and GPT-5 was specifically trained to minimize sycophancy, from GPT-5's announcement:
>We’ve made significant advances in reducing hallucinations, improving instruction following, and minimizing sycophancy
And the whole drama in August 2025 when people complained GPT-5 was "colder" and "lacked personality" (= less sycophantic) compared to GPT-4o
It would be interesting to study evolution of sycophantic tendencies (decrease/increase) in models from version to version, i.e. if companies are actually doing anything about it
The study includes GPT-5. On personal advice queries, GPT-4o and GPT-5 affirmed users' actions at the same rate.