logoalt Hacker News

LarsDu88today at 8:48 AM0 repliesview on HN

A lot of this post-training recipe feels reminiscent of DINO training (teacher/student, use of stop gradients). I wonder if the more recent leJEPA SigREG regularization research might be relevant here for simpler post-training.