logoalt Hacker News

why_atyesterday at 7:08 PM12 repliesview on HN

My first impression coming away from this is skepticism.

Anything with voice controls for routine use is a pretty tough sell. Doing this when you're not completely alone would be annoying to everyone around you.

Most of their examples seem like they could have been done with a right click drop down menu so they don't really need to "re-invent the mouse pointer".

So is this thing talking to Google's servers all the time for the AI integration? So it won't work if you're not connected to the internet? Privacy concerns are obvious; now Google wants to have an AI watching literally everything you do on your computer?

Does it cost the user anything for the LLM use? If it's free will it stay free forever? That's quite a lot to give away if they're expecting people to use it to change a single word like in one of their examples. I guess they're expecting to make the money back by gathering data about literally everything you do on your computer.

There might be a killer app for AI integration with personal computers that has yet to be invented, but this doesn't look like it.


Replies

jasonjayrtoday at 4:52 AM

The killer app was conceived as early as the 1980s: an agent running on your computer, organizing your files, your schedule, your messages, your bills, bank accounts, etc. All the parts of your life that were routine drudgery should be able to be offloaded to a smart agent, based on your preference, to bring you the information you needed with natural language queries, contextualized to what you were doing at the time, when you need it.

What's being delivered now is, an agent running on someone else's computer, copying your data to someone else's database, with zero responsibility, or mandate to protect that data and not share with with anyone else (in fact, they almost always promise to share it with their thousand partners), offering suggestions and preferences based on someone else's so-called recommendations, influenced by paying the agent's operators, and increasing pressure to make using someone else's computers + agents the only way to interact with other people and systems.

There is no doubt that LLM's can do amazing things, but the current environment seems to make it nearly impossible to do anything with them that doesn't let someone else inspect, influence, and even restrict everything you are doing with with these systems.

show 2 replies
concindsyesterday at 9:29 PM

The second half of your comment is a go-to-market concern but doesn't feel so relevant for a research prototype. It could be done with a private local model too, maybe not by Google.

But I don't think the voice problem is surmountable. I closed their image editing demo when I saw it required a mic.

It would be appealing as a Spotlight-like text pop-up interface where you type instructions, which would work in social/office environments, but that might only appeal to power users.

show 3 replies
YeGoblynQueennetoday at 10:57 AM

Yes, it does seem kinda ... pointless.

websaptoday at 2:48 AM

You should look into how often people are using tools like WisprFlow and SuperWhisper. Voice is a very native mechanism. Most people working in open floor plans are wearing headphones any way. As long as you're not screaming, it's probably fine. Maybe, we'll move away from open plan offices in the bid for efficiency, which I would welcome.

show 3 replies
schnitzelstoattoday at 7:10 AM

Yeah, I'd hate to use this in an open-plan office (which is like 99% of offices these days) and even using it alone at home would feel awkward. I don't really want to talk to the computer despite what 1950's sci-fi books led us to believe.

It's a cool idea for the future when we have reliable EEG headsets or Neuralink or whatever though.

show 1 reply
fnytoday at 12:26 AM

It's possible to rely on mouth movements instead of sound. I've been tweaking visual speech recognition models (VSR) for the past few weeks so that I can "talk" to my agents at the office without pissing everyone off. It works okay. Limiting language to "move this" "clear that" along side context cues vastly simplifies the problem and makes it far more possible on device.

I think its brilliant UX.

show 1 reply
ei23today at 6:42 AM

>Anything with voice controls for routine use is a pretty tough sell. Doing this when you're not completely alone would be annoying to everyone around you.

Reads like the argument against cell phones where don't have a cabinet around you...

show 2 replies
DonHopkinstoday at 12:50 AM

A General-Purpose Bubble Cursor

https://www.youtube.com/watch?v=46EopD_2K_4

>We present a general-purpose implementation of Grossman and Balakrishnan's Bubble Cursor [broken link] the fastest general pointing facilitation technique in the literature. Our implementation functions with any application on the Windows 7 desktop. Our implementation functions across this infinite range of applications by analyzing pixels and by leveraging human corrections when it fails.

Transcript:

>We present the general purpose implementation of the bubble cursor. The bubble cursor is an area cursor that expands to ensure that the nearest target is always selected. Our implementation functions on the Windows 7 desktop and any application for that platform. The bubble cursor was invented in 2005 by Grossman and Balakrishnan. However a general purpose implementation of this cursor one that works with any application on a desktop has not been deployed or evaluated. In fact the bubble cursor is representative of a large body of target aware techniques that remain difficult to deploy in practice. This is because techniques like the bubble cursor require knowledge of the locations and sizes of targets in an interface. [...]

https://www.dgp.toronto.edu/~ravin/papers/chi2005_bubblecurs...

>The Bubble Cursor: Enhancing Target Acquisition by Dynamic Resizing of the Cursor’s Activation Area

>Tovi Grossman, Ravin Balakrishnan; Department of Computer Science; University of Toronto

I've written more about Morgan Dixon's work on Prefab (pre-LLM pattern recognition, which is much more relevent with LLMs now).

https://news.ycombinator.com/item?id=11520967

https://news.ycombinator.com/item?id=14182061

https://news.ycombinator.com/item?id=18797818

https://news.ycombinator.com/item?id=29105919

nolist_policyyesterday at 7:14 PM

The "Edit an Image" Demo at the bottom is pretty fun. Maybe this is just Google flexing their LLM inference capacity.

show 1 reply
AirMax98yesterday at 7:28 PM

Right — it does seem cool but the voice is patching over a major gap. If I'm talking already, why wouldn't I just describe what I'm looking at and have the AI grab it for me?

show 1 reply
Our_Benefactorstoday at 2:46 AM

Yup - what google is suggesting here will never materialize beyond being a slopfeature. People who want these bespoke workflows will build them or seek out specific tools that enable them, not trusting some overarching daemon that contextually watches their cursor. I don't trust google one bit to execute correctly on something like this.

hansmayertoday at 8:18 AM

Well you see to really, really sell it to the common folks, they need to convince you that ChatBots are the "Intelligence" . So they are coming up with all sorts of crap, like this one. The TV advertisements for Gemini and co. are indicative of how they see the average user, as an idiot of sorts, who needs the shit-device for pretty much anything. Oh you spilled some water on the counter top? Quick, ask Gemini what to do! You are a 20ish something individual home alone? Quick, lay on the couch and ask Gemini if you can really talk to it, omg, its so exciting! You were in holidays all alone, but in the middle of a really large crowd? Gemini to the help, cut those people out and make it look like it was an exclusive spot, just for you! Nobody else was there. So this proposal is going into the same direction - probably targeting the average office "idiot".