logoalt Hacker News

perayesterday at 4:52 PM1 replyview on HN

Sorry but your argument doesn't seem coherent: How is the cost of RL relevant here?

It would also help if you could substantiate your initial claim (i.e. "internet training data is not where frontier capabilities come from")


Replies

ivanovmyesterday at 5:24 PM

RL environment (instruction, stateful container, reward function) is the training data product being bought