Architectures have evolved significantly since then. DeepSeek v4 =/= GPT-3. Even then, a grea...

faurroar • today at 5:14 AM • 1 reply • view on HN

Architectures have evolved significantly since then. DeepSeek v4 =/= GPT-3. Even then, a great deal of complexity lies in everything surrounding the architectures e.g. how do you implement them performantly on modern accelerators, how do you distribute the model across a set of accelerators, how do you post-train, etc. And pre-training itself is a dark art. If you legitimately think that frontier labs are doing something equivalent to whatever you wrote on your whiteboard, you’re clueless.

Replies

jumploops • today at 5:26 AM

Those are all just optimizations.

We still don’t really know why they work, we just know how to build them.

➕ show 3 replies

alt Hacker News

Replies