This sounds like a gamechanger for speed and efficiency if it can scale up. "However, our mod...

jhack • today at 12:13 AM • 0 replies • view on HN

This sounds like a gamechanger for speed and efficiency if it can scale up.

"However, our models are nevertheless relatively small and trained on tiny amounts of instruction examples, compared to the scale of modern instruction data and multiple post-training stages used to reinforce the default message-based format. We do think that parallel streams are a conceptually enticing format, and that future work on a larger scale will go further to show these benefits."

alt Hacker News