logoalt Hacker News

ACCount37yesterday at 11:57 AM0 repliesview on HN

We don't value LLMs for rote memorization though. Perfect memorization is a long solved task. We value LLMs for their generalization capabilities.

A scuffed but fully original ASCII SpongeBob is usually more valuable than a perfect recall of an existing one.

One major issue with highly sparse MoE is that it appears to advance memorization more than it advances generalization. Which might be what we're seeing here.