> This was openai’s entire breakthrough. Making this particular model architecture larger leads to emergent capabilities
Basically, the bitter lesson: https://www.cs.utexas.edu/~eunsol/courses/data/bitter_lesson...
Isn't the bitter lesson basically the same as "The Unreasonable Effectiveness of Data" from 2009?
This interview https://youtu.be/oWOz2htozfI?si=qdQ0uZRoZOYeThOn from 2 days ago with a top researcher from OpenAI directly addresses the bitter lesson argument and the importance of scaling for the history of their models.