Yep, in general, the interesting question is how you merge multiple streams that have been worked on in parallel without imposing a particular order or prioritizing one over another. (We do know how to do this in principle, that's exactly what a vector sum of encoded representations does. But it's not clear how to train the model so that it can recognize the outcome of that vector sum operation as meaningful.) Working on multiple streams at the same time is just what subagents do naturally, so the merge is the interesting part.