Implemented this to transcribe voice chat in a project and the streaming accuracy in English on this was unusable, even with the medium streaming model.