Most of the gRPC implementations force buffering of the whole response for large unary responses. They are not really written by people who care about performance. It’s dumb because the protobuf binary marshaled format is perfectly designed for server-side incremental marshaling.
Performance is relative. gRPC is plenty fast enough for my use case, and for that matter, almost all client/server use cases that work across the Internet. If a Javascript web client against a REST backend is fast enough latency-wise, then a local gRPC connection on a single PC is gonna feel like greased lightning. Of course, there will be a few scenarios where tight coupling of client/server are required for good enough performance, but they are few and far between.