LLM-as-a-judge is quite effective method to RL a model, similar to RLHF but more objective and scala...

red2awn • yesterday at 10:40 PM • 0 replies • view on HN

LLM-as-a-judge is quite effective method to RL a model, similar to RLHF but more objective and scalable. But yes, anthropic is making it more serious than it is. Plus DeepSeek only did it for 125k requests, significantly less than the other labs, but Anthropic still listed them first to create FUD.

alt Hacker News