logoalt Hacker News

jmalickiyesterday at 6:54 PM1 replyview on HN

Given the way the test was structured it does line up.

https://arxiv.org/abs/2503.23674


Replies

Melatonicyesterday at 7:37 PM

Surprisingly good. I wonder how they would have done without the 5 minute limit on conversations (average of 8 messages per convo per the study)