Am I understanding correctly that an implication of this is reduced context? since they are streaming by splitting the input into streams the total context is now split amongst those streams and a particular streams context will be shorted to to context/ streams?
I think the main benefit is improved speed and parallelism. Very similar to https://thinkingmachines.ai/blog/interaction-models/