A major limitation is that they only test GPT 4o. Previous research like [1] investigating the same ...

wongarsu • today at 2:20 PM • 0 replies • view on HN

A major limitation is that they only test GPT 4o. Previous research like [1] investigating the same question has shown significant differences between models, and even depending on the language of your prompt

1: https://aclanthology.org/2024.sicon-1.2.pdf

alt Hacker News