See also * https:/... | alt Hacker News

culi • yesterday at 6:39 PM • 0 replies • view on HN

See also

* https://lmarena.ai/leaderboard — crowd-sourced head-to-head battles between models using ELO

* https://dashboard.safe.ai/ — CAIS' incredible dashboard (cited in OP)

* https://clocks.brianmoore.com/ — a visual comparison of how well models can draw a clock. A new clock is drawn every minute

* https://eqbench.com/ — emotional intelligence benchmarks for LLMs