A lot of this post-training recipe feels reminiscent of DINO training (teacher/student, use of ...

LarsDu88 • today at 8:48 AM • 0 replies • view on HN

A lot of this post-training recipe feels reminiscent of DINO training (teacher/student, use of stop gradients). I wonder if the more recent leJEPA SigREG regularization research might be relevant here for simpler post-training.

alt Hacker News