Not only that, they additionally ran an experiment with the training temperature turned way up (2.0)...

fpgaminer • yesterday at 11:42 PM • 0 replies • view on HN

Not only that, they additionally ran an experiment with the training temperature turned way up (2.0) and truncation turned off such that the majority of SFT examples were incoherent (63% IIRC). Yet the model finetuned on these broken examples still improved over baseline.

alt Hacker News