logoalt Hacker News

darkwatertoday at 3:33 PM1 replyview on HN

What sense of rigour is going to be in a field (LLM usage as a user) where models, context sizes, tooling and broadly "rules" (scary quotes) change every few weeks? There is no literal change to have a scientific approach to anything, churn is too high, there are papers about model XYZ v 12345 from a few months ago that are already old because there is model ABC on version 54321 that addresses half of the issue shown in the paper and add 3 new problems though.


Replies

skybriantoday at 4:58 PM

With benchmarks, you can re-run them after a change. A measurement in a paper will go out of date quickly unless turned into a benchmark.