>We evaluated 11 state-of-the-art AI-based LLMs, including proprietary models such as OpenAI’s GP...

kgeist • today at 4:10 PM • 1 reply • view on HN

>We evaluated 11 state-of-the-art AI-based LLMs, including proprietary models such as OpenAI’s GPT-4o

The study explores outdated models, GPT-4o was notoriously sycophantic and GPT-5 was specifically trained to minimize sycophancy, from GPT-5's announcement:

>We’ve made significant advances in reducing hallucinations, improving instruction following, and minimizing sycophancy

And the whole drama in August 2025 when people complained GPT-5 was "colder" and "lacked personality" (= less sycophantic) compared to GPT-4o

It would be interesting to study evolution of sycophantic tendencies (decrease/increase) in models from version to version, i.e. if companies are actually doing anything about it

Replies

Twiin • today at 5:02 PM

The study includes GPT-5. On personal advice queries, GPT-4o and GPT-5 affirmed users' actions at the same rate.

alt Hacker News

Replies