logoalt Hacker News

faurroartoday at 5:14 AM1 replyview on HN

Architectures have evolved significantly since then. DeepSeek v4 =/= GPT-3. Even then, a great deal of complexity lies in everything surrounding the architectures e.g. how do you implement them performantly on modern accelerators, how do you distribute the model across a set of accelerators, how do you post-train, etc. And pre-training itself is a dark art. If you legitimately think that frontier labs are doing something equivalent to whatever you wrote on your whiteboard, you’re clueless.


Replies

jumploopstoday at 5:26 AM

Those are all just optimizations.

We still don’t really know why they work, we just know how to build them.

show 3 replies