To get pure grokking, you need a model large enough to easily memorize the entire training data and ...

yorwba • today at 6:50 AM • 0 replies • view on HN

To get pure grokking, you need a model large enough to easily memorize the entire training data and keep training for a long time after memorization. In practice, you'll probably use a more realistically-sized model that might grok on some subset of the data, but not so strongly that it's extremely obvious.

alt Hacker News