Not a public follow-up but the iOS 17 speech-to-text model has a clever approach to KV caching that ...

smpanaro • 05/03/2025 • 1 reply • view on HN

Not a public follow-up but the iOS 17 speech-to-text model has a clever approach to KV caching that works within the ANE’s constraints (fixed size inputs).

I wrote about it here[0] but the gist is you can have a fixed size cache and slide it in chunks with each inference. Not as efficient as a cache that grows by one each time of course.

[0]: https://stephenpanaro.com/blog/inside-apples-2023-transforme...

Replies

kamranjon • 05/04/2025

Hey I just wanted to say that this is an amazing write up and I'm bookmarking your blog cause there isn't a ton of information out there about this stuff as it related to Apple hardware and you do a really great job of explaining many of the concepts that I'm wasn't already familiar with. Thank you!

alt Hacker News

Replies