In English it is pretty good. But talk to it in Polish, and suddenly it thinks you speak Russian? Ukranian? Belarus? I would understand if an American company launched this, but for a company being so proud about their European roots, I think it should have better support for major European languages.
I tried English + Polish:
> All right, I'm not really sure if transcribing this makes a lot of sense. Maybe not. A цьому nie mówisz po polsku. A цьому nie mówisz po polsku, nie po ukrańsku.
> The model is natively multilingual, achieving strong transcription performance in 13 languages, including English, Chinese, Hindi, Spanish, Arabic, French, Portuguese, Russian, German, Japanese, Korean, Italian, and Dutch.
Try sticking to the supported languages
Yeah, it's too bad. Apparently it only performs well in certain languages: "The model is natively multilingual, achieving strong transcription performance in 13 languages, including English, Chinese, Hindi, Spanish, Arabic, French, Portuguese, Russian, German, Japanese, Korean, Italian, and Dutch"
That's a mix of Polish and Ukrainian in the transcript. Now, if I try speaking Ukrainian, I'm getting transcript in Russian every time. That's upsetting.
Cracking non-English or accented / mispronounced English is the white whale of text-to-speech I think; I don't know about you, but in our day to day chats there's a lot of jargon, randomly inserted English words, etc. And when they speak in English it's often what I call expat-English which is what you get when non-native speakers only speak the language with other non-native speakers.
Add poor microphone quality (using a laptop to broadcast a presentation to a room audience isn't very good) and you get a perfect storm of untranscribeable presentations or meetings.
All I want from e.g. Teams is a good transcript and, more importantly, a clever summary. Because when you think about it, imagine all the words spoken in a meeting and write them down - that's pages and pages of content that nobody would want to read in full.
TBH ChatGPT does the same, when I mix Polish and English. Generally getting some cyrillic characters and it gets super confused.
I'm not sure why but their multilingual performance in general has usually been below average. For a French company, their models are not even close to being best in French, even outdone by the likes of Qwen. I don't think they're focusing on anything but English, the rest is just marketing.
polish logically should be rendered in cyrillic as the cyrillic orthography more closely matches the sounds and consonant structure of slavic languages like polish and russian, although this has never been done for church reasons . maybe this is confusing ai
They don't claim to support Polish, but they do support Russian.
> The model is natively multilingual, achieving strong transcription performance in 13 languages, including English, Chinese, Hindi, Spanish, Arabic, French, Portuguese, Russian, German, Japanese, Korean, Italian, and Dutch. With a 4B parameter footprint, it runs efficiently on edge devices, ensuring privacy and security for sensitive deployments.
I wonder how much having languages with the same roots (e.g. the romance languages in the list above or multiple Slavic languages) affects the parameter count and the training set. Do you need more training data to differentiate between multiple similar languages? How would swapping, for example, Hindi (fairly distinct from the other 12 supported languages) for Ukrainian and Polish (both share some roots with Russian) affect the parameter count?