logoalt Hacker News

terabytestyesterday at 2:01 PM3 repliesview on HN

Wispr Flow is a masterclass in STT. Apple's solution feels like it's from the last century in comparison. Same applies with Apple's TTS when you have ElevenLabs and OpenAI running laps around it. All I need is for my iPhone to do those things natively at the same quality level (because in Apple's walled garden that's the only way to get them usable everywhere).


Replies

jjiceyesterday at 2:38 PM

But Apple's uses so few system resources and runs fully on device on newer iPhone models (16+ I believe). It's so efficient. I really enjoy using Handy with Parakeet as the model, but the system resource usage is a monster compared to Apple's (although very good).

Looks like Wispr Flow uses a cloud model [0]:

> Cloud based speech processing infrastructure for 1B users

It gets to be a messy comparison because my iPhone can do STT with no latency pretty well fully on device, but Wispr Flow requires a cloud model, but to be fair, older Apple devices do as well. It's not an apples and oranges comparison, but I think those technical details make this a non direct comparison in a few ways.

For on-device with low system resource usage, Apple's is pretty damn good.

[0] https://wisprflow.ai/post/technical-challenges

show 3 replies
adamcharnockyesterday at 2:11 PM

FWIW - I also really like Wispr Flow, but I moved to running the 'Whisper Large' model locally using Handy (https://github.com/cjpais/Handy), which has been essentially as good, while also having lower latency.

show 1 reply
primaprashantyesterday at 6:36 PM

I’d say STT is pretty much a solved problem. Everyday there is a new product and can be one-shotted by any current top of the line LLMs. Take a look at this [1]. Apple is just stuck in the past.

https://github.com/primaprashant/awesome-voice-typing