Computer use in Gemini 3.5 Flash

237 points • by swolpers • yesterday at 5:21 PM • 160 comments • view on HN

Comments

smallstepforman • yesterday at 9:06 PM

Today I asked Gemini to extract a table from an PDF appendix and create C++ data table with its contents. After 15 or so iterations with corrections and new mistakes, it eventually gave up. I was floored when it said “I’m sorry, I cannot do this simple task, I’ve exceeded my error threshold and cannot do this task for you. My LLM prediction engine invents data instead of doing a simple data copy/reformat”.

Stunned to see that Gemini threw its digital arms in the air and gave up.

➕ show 16 replies

jorjon • today at 1:33 AM

Gemini Flash 3.5 (through agy) ran `git reset --hard` when I asked it to commit my changes, apparently it thought it was better to have a clean repo before `git add`. Of course I'm not trusting my computer to it. When will we have 3.5 Pro?

➕ show 1 reply

satvikpendem • yesterday at 5:40 PM

There's still no MCP support in the Gemini app, which is very useful to get various pieces of info as a user just via chatting. For example I recently wanted to get an Airbnb and wanted to filter by specific criteria including house image analysis and Gemini couldn't do it so I had to do it in Codex.

➕ show 5 replies

mlmonkey • yesterday at 5:59 PM

It's funny how in their own graph, https://storage.googleapis.com/gweb-uniblog-publish-prod/ima... Gemini 3.5 Flash is beat hands down by both Opus 4.8 and GPT 5.5, and yet the graph is drawn as if Gemini wins ... :-D

➕ show 5 replies

YuechenLi • yesterday at 11:06 PM

So... has Google provided a Codex/Claude Code equivalent to Gemini yet? I would like to use Gemini for coding tasks, but that's kind of difficult to do as I don't even know how to get Gemini to even "clone this repo and read the code in it for static analysis", much less open PRs in repos.

ChatGPT/Codex can do it, Claude can do it, why can't Gemini?

And no, I don't mean going through Antigravity, and personally I'm wary about LLMs having unsupervised access on my computer without explicit policy, so I really think Google is putting the cart before the horse here.

➕ show 4 replies

airstrike • yesterday at 5:57 PM

Computer use is such a terrible idea. It's slow, insecure, error prone, expensive.

I guess if you're trying to get people to tokenmaxx it may look like a valid strategy, but ain't no way this will be delightful to users.

I think it's a symptom of just not understanding how LLMs should interface with the OS because we're still in their early days.

Eventually there'll be an iPhone moment for the ergonomics of LLM usage outside of coding

➕ show 9 replies

AbuAssar • today at 12:17 PM

Google should drop the flash moniker, as it implies a small model

s_kazmi • today at 10:21 AM

stopped using gemini a couple months ago when they ruined their rate limits. Not sure why people still use em. They were good with the generous rate limits in antigravity. but done after that.

revolvingthrow • yesterday at 7:22 PM

People using google’s models: am I holding it wrong or are the guardrails really overtuned?

I had the dubious pleasure of testing gemini of late and I kept running into refusals. How do I transfer a sim number from one provider to another? No. What should I consider when making backups on ntfs less prone to data loss and more bitrot resistant? No. Evaluate this piece of code? No.

I’m not sure if it’s cold feet from the mythos situation or what, but it reminds me of the dark days where you couldn’t use ai for much of anything. But then I go to chatgpt 5.5 and it does mostly everything I want outside of the usual cybersecurity boogeyman that you run into now and then.

➕ show 8 replies

arjunchint • yesterday at 10:40 PM

Pretty doubtful about computer use/screenshotting based approaches.

With Retriever AI, we construct custom accessibility trees to represent web pages and just switched over to using DeepSeek v4 Flash and its nearing 100x cost decrease.

We also had great success just reverse engineering the underlying APIs of websites and then writing code to hit them. This approach of using screenshots to take actions on a webpage to trigger the underlying network calls the website is making seems too naive.

➕ show 2 replies

fridder • yesterday at 7:13 PM

I wonder if it will be better at building TUI's. It has been absolutely abysmal at interacting with them and building them

➕ show 2 replies

beastman82 • yesterday at 6:10 PM

No UI like their competitors Claude CoWork or Codex. This is vaporware

knollimar • yesterday at 6:58 PM

Where is 3.5 pro?

➕ show 1 reply

villgax • yesterday at 6:19 PM

Will it skip Ads lol

➕ show 1 reply

vulcan1964 • today at 4:04 AM

Hot take: "computer use" is a dumb term for this concept; almost as if it was named by AI models...

Case in point: "We are already seeing customers drive value with computer use."

Yes... since the early 1980s, most companies and businesses have driven their value with computer use... smdh.

I'm no AI dev, but dare I suggest a better possible name for this: "agentic computer software interaction" which can be shortened to agent_actor

I swear, the direction we are headed with Big Tech leading the way will surely spell long term disaster

ai_fry_ur_brain • yesterday at 11:39 PM

I have basically unlimited access to every SOTA model and I opt for gemini flash 3.5 9/10 times I use an LLM.

Llms are mostly useless but when I do use them its with gemini. If they're going to waste my time 95% of the time, I might as well get it over with fast.

zuzululu • yesterday at 7:05 PM

performance is quite impressive given that its 3x cheaper than 5.5

➕ show 1 reply

cws_ai_buddy • yesterday at 9:06 PM

[flagged]

jkwang • today at 8:06 AM

[dead]

shafiemoji • today at 5:38 AM

My work requires me to use `agy cli` (Google AI Ultra) for development, and it's been incredibly frustrating. I strongly dislike the Gemini models because they consistently fail to grasp basic instructions. I also can't use the Claude models included in the AI Ultra plan because the agy cli wrapper makes the experience completely unusable. I'd rather use the free plan on OpenCode than deal with this Gemini setup.

➕ show 1 reply

paganartifact • yesterday at 9:58 PM

Who are these people talking about "agentic" stuff, and furthermore who are the people who can't stfu about "MCP"??

Literally 90%+ comments on HN personify their alleged use of AI in a way that is in NO WAY related to how the tool is really used.

Using LLMs for building software has NOTHING to do with those concepts. Nobody has "agents". That literally only exists in marketing. It's not even how it works.

AT ALL

Useless forum

alt Hacker News

Computer use in Gemini 3.5 Flash

Comments