logoalt Hacker News

parliament32yesterday at 10:16 PM6 repliesview on HN

When I look at LLMs as an interface, I'm reminded of back when speech-to-text first became mainstream. So many promises about how this is the interface for how we'll talk to computers forevermore.

Here we are a few decades later, and we don't see business units using Word's built-in dictation feature to write documents, right? Funny how that tech seems to have barely improved in all that time. And, despite dictation being far faster than typing, it's not used all that often because.. the error rate is still too high for it to be useful, because errors in speech-to-text are fundamentally an unsolvable problem (you can only get so far with background noise filtering and accounting for accents etc).

I see the parallel in how LLM hallucinations are fundamentally an unsolvable component of transformers-based models, and I suspect LLM usage in 20 years will be around the level of speech-to-text today: ubiquitously in the background, you use it here and there to set a timer or talk to a device, but ultimately not useful for any serious work.


Replies

dweinusyesterday at 10:27 PM

I think there is a second reason people still type, and it's relevant to LLMs. Typing forces you to slow down and choose your words. When you want to edit, you are already typing, so it doesn't break the flow. In short, it has a fit to the work that speech-to-text doesn't.

LLMs create a new workflow wherever they are employed. Even if capable, that is not always a more desirable/efficient experience.

bigstrat2003today at 12:06 AM

Yeah this is exactly my view. We've had several years of work on the tech, and LLMs are just as prone to randomly spitting out garbage as they were the first day. They are not a tool which is fit for any serious work, because you need to be able to rely on your tools. A tool which is sometimes good and sometimes bad is worse than having no tool at all.

show 2 replies
sadeshmukhyesterday at 11:27 PM

I type faster than I think, and being able to edit gives the edge over text to speech. I don't believe this is a fundamentally comparable analogy.

SchemaLoadtoday at 1:08 AM

I'd say speech to text is unsolvable for a more fundamental reason that it's hard to actually speak out an entire document flawlessly in one take.

Spoken language is very different to written language, which is why for example you can easily tell when an article is transcribing a spoken interview.

show 1 reply
buzzerbetrayedtoday at 1:32 AM

The completely different way people are experiencing AI is fascinating.

In my world AI is already far more influential than text to speech.

People on here act like we don’t know if AI will be useful. And I’m sitting over here puzzled because of how fucking useful it is.

Very strange.

show 1 reply
johnfnyesterday at 10:26 PM

I'm curious about the statement that hallucinations are "fundamentally unsolvable". I don't think an AI agent has left a hallucination in my code - by which I mean a reference to something which doesn't exist at all - in many months. I have had great luck driving hallucinations to effectively 0% by using a language with static typechecking, telling LLMs to iterate on type errors until there are none left, and of course having a robust unit and e2e test suite. I mean, sure, I run into other problems -- it does make logic errors at some rate, but those I would hardly categorize the same as hallucinations.

show 4 replies