logoalt Hacker News

7etoday at 1:55 AM1 replyview on HN

More evidence that the original Transformer authors didn't really know what they were doing, but they did have access to more cheap compute than anyone else.


Replies

spindump8930today at 5:41 PM

Can you share the specific part of this work that demonstrates better scaling than original transformers? Also note that many of the changes to that architecture, that have been proven in their use at actual scale, were brought about by members of the original team. Most notably Noam Shazeer.