logoalt Hacker News

fpgamineryesterday at 11:42 PM0 repliesview on HN

Not only that, they additionally ran an experiment with the training temperature turned way up (2.0) and truncation turned off such that the majority of SFT examples were incoherent (63% IIRC). Yet the model finetuned on these broken examples still improved over baseline.