logoalt Hacker News

cduzztoday at 2:12 PM0 repliesview on HN

I think this is mixing domains quite a bit;

If I'm talking to a friend or peer and I'm on a crappy link, we can probably work it out. If I'm calling my lawyer from prison with my "one call" I really want my lawyer to get my instructions clearly and correctly, ideally the first time without a lot of coaching.

Where on this scale does "person talking to LLM" fit?

I believe there's a ton of research into the shannon limit and human speech. You can trivially observe how much redundancy there is by listening to a podcast at 1x, 1.2x, 1.5x, 2x, etc, and when you can't follow what's going on, you've found the "redundancy" built into that language. This number falls way off when you're listening to a person with an accent or when the recording is noisy or whatever.

You'll also find that your tolerance for lossy media is radically different based on latency and echos and jitter in the audio (which I believe is the point of the original "don't use webrtc" article...)

Finally, people may tolerate this, but the "phonem to token" thinger may be less tolerant, and will certainly not be able to magic correct meaning from lost packets, and if the resulting exchange is extremely expensive or important (from the lawyer and the "I'm in jail in poughkeepsie; I need bail!" exchange) you really want to take the time to get it right, not make things guess.