If your data is naturally sharded (users) with writes happening within a single shard, parallelism becomes easy. The request is routed to the shard hosting the user's data and reads/writes locally.
This makes scalability _much_ easier to reason about. It's cut-paste, cut-paste. Every N users needs another shard.
It does buy you a _different_ set of problems, like cross-shard querying (analytics) and how to do load leveling as users age out.
But it avoids the whole shared index scaling problems from inserts/updates with large user counts.
It becomes a hierarchical instead of a relational database.