logoalt Hacker News

embedding-shapetoday at 4:27 PM0 repliesview on HN

Indeed, this is all very true, I'd say it's true for the larger teams too, the entire ecosystem is so gamed by now that if you don't have your own private benchmarks with private test cases you haven't shared publicly, it's almost impossible to get a fair picture how well a model works, unless you actually sit down and use it.