I’m sure you’ve tried all this but you’ve tried inter-rater agreement via multiple attempts on same ...

renewiltord • yesterday at 6:24 PM • 0 replies • view on HN

I’m sure you’ve tried all this but you’ve tried inter-rater agreement via multiple attempts on same LLM vs different LLM? Perhaps your system would work better if you ran it through 5 models 3 times and then highlighted diffs for human chooser.

alt Hacker News