Gemini 3.5 Flash

669 points • by spectraldrift • yesterday at 5:43 PM • 483 comments • view on HN

https://ai.google.dev/gemini-api/docs/models/gemini-3.5-flas...

Comments

For those who would like to know the total and active parameter count of this model: even though Google doesn't disclose the model technicals, we can infer them within relatively tight margins based on what we do know.

We know they serve the model on TPU 8i, which we have plenty of hard specs for (so we know the key constraints: total memory and bandwidth and compute flops). We can also set a ceiling on the compute complexity and memory demand of the model based on knowing they will be at least as efficient as what is disclosed in the Deepseek V4 Technical Report.

We can also assume that the model was explicitly built to run efficiently in a RadixAttention style batched serving scenario on a single TPU 8i (so no tensor parallelism, etc. to avoid unnecessary overheads... Google explicitly designed the 8th-generation inference architecture to eliminate the need for tensor sharding on mid-sized models).

We know Google intends to serve this model at a floor speed of around 280 tok/s too.

Putting all these pieces together, we can confidently say this model is ~250-300B total, and 10-16B active parameters. Likely mostly FP4 with FP8 where it matters most.

Visual:

  ┌────────────────────────────────────────────────────────┐
  │                   TPU 8i VRAM (288 GB)                 │
  ├───────────────────────────┬────────────────────────────┤
  │   Static Model Weights    │  Dynamic Allocations &     │
  │   (250B - 300B @ Mixed    │  Compressed KV Caches      │
  │   FP4/FP8)                │  (RadixAttention / SRAM)   │
  │   ~110 GB - 150 GB        │  ~138 GB - 178 GB          │
  └───────────────────────────┴────────────────────────────┘

I do model serving optimization work. This is napkin math.

➕ show 5 replies

simonw • yesterday at 7:29 PM

The pelican is a lot: https://github.com/simonw/llm-gemini/issues/133#issuecomment...

Not a great bicycle though, it forgot the bar between the pedals and the back wheel and weirdly tangled the other bars.

Expensive too - that pelican cost 13 cents: https://www.llm-prices.com/#it=11&ot=14403&sel=gemini-3.5-fl...

➕ show 23 replies

GodelNumbering • yesterday at 6:58 PM

Per million input/output tokens:

Gemini 2.5 flash: $0.30/$2.50

Gemini 3.0 flash preview: $0.50/$3.00

Gemini 3.5 flash: $1.50/$9.00

Interesting pricing direction. I don't think we have ever seen a 3x price increase for in the immediate next same-sized model (and lol @ 3 only ever getting a preview).

3.5 flash costs similar to Gemini 2.5 pro which was $1.25/$10

➕ show 20 replies

SXX • yesterday at 6:18 PM

  > Create animated SVG of a frog on a boat rowing through jungle river. Single page self contained HTML page with SVG

3.5 Flash: Thinking Medium - 7516 tokens

https://gistpreview.github.io/?5c9858fd2057e678b55d563d9bff0...

3.5 Flash: Thinking High - 7280 tokens

https://gistpreview.github.io/?1cab3d70064349d08cf5952cdc165...

3.1 Pro - 28,258 tokens

https://gistpreview.github.io/?6bf3da2f80487608b9525bce53018...

Though 3.1 took 3 minutes of thinking to generate, but it only one that got animated movement.

➕ show 10 replies

OhMeadhbh • yesterday at 7:56 PM

Am I really so old that when someone says "Flash" my immediate response is... "consider HTML5 instead" ??

➕ show 6 replies

lanewinfield • yesterday at 7:48 PM

Gemini 3.5 Flash's 2000 token clocks aren't bad. https://clocks.brianmoore.com/

➕ show 1 reply

hmate9 • yesterday at 8:49 PM

I have google ai pro plan and tried antigravity with 3.5 flash but it used up all my quota in two prompts. If that is not a bug then it is seriously unusable.

➕ show 3 replies

gertlabs • today at 2:32 AM

Taking into account that this is a flash model, it's a strong release. It's very fast and frontier-ish for the price.

Raw intelligence is high for a flash model. But Google's problem has always been productization and tool use, whereas raw intelligence is always competitive. It does not look like they solved that with this release -- in fact, their tool use delta (the improvement in scores when given arbitrary tools and a harness) has actually regressed from some previous models.

Data at https://gertlabs.com/rankings

reconnecting • yesterday at 7:17 PM

Knowledge cutoff: January 2025

Latest update: May 2026

I have a very bad feeling about this lag.

➕ show 4 replies

nl • today at 2:06 AM

On my Agentic SQL benchmark it scores 19/25. That's... mediocre.

It means performs worse than 3.1 Flash Lite Preview (22/25), is slower (367s vs 142s) and is more expensive (75c vs 2c).

It is outperformed by Gemma4 26B-A4B in every way(!)

https://sql-benchmark.nicklothian.com/?highlight=google_gemi...

(Switch to the cost vs performance chart to see how far this is off the Pareto frontier)

margorczynski • yesterday at 10:13 PM

Wow at the price hike. Still I think in the long run the Chinese will win if they're able to produce hardware comparable to Nvidia.

➕ show 4 replies

npn • yesterday at 6:57 PM

The price is crazy.

And I guess Gemini 3.5 pro will have the pricing increment, too. 12 x 5 = 60?

It seems like google does want us to use Chinese models.

➕ show 1 reply

wg0 • yesterday at 7:59 PM

3x price increase for a similar model almost. And they said AI would be cheaper and ubiquitous.

➕ show 2 replies

puapuapuq • today at 4:33 AM

I played the audio readout of the page, what is the last 30 secs in the readout?

s3p • yesterday at 6:43 PM

Yikes. I think the concept of a 'flash' model is changing, no? Google used to market this as its lower-intelligence, faster, cheaper option. I appreciate that they are delivering on both of those, but personally I would appreciate if they could create an incremental knowledge improvement while holding price steady. Fortune 500 companies have to make their money I guess.

➕ show 4 replies

OsrsNeedsf2P • yesterday at 6:38 PM

Beats 3.1 Pro for price per token, but artificial analysis is showing it's dumber per token and costs more overall

➕ show 3 replies

asar • yesterday at 6:04 PM

$1.5/m input tokens $9/m output tokens

6x the price of 3.1 flash lite

➕ show 5 replies

brikym • yesterday at 9:25 PM

How is this progress? The token cost just keeps going up and up. Flash is the new Pro? Do the models actually cost more to run or is it fattening margins?

nikhilpareek13 • yesterday at 9:14 PM

worth noting that Google marked this stable rather than preview, which is unusual compared to their recent releases. Pair that with the 3x price hike and flash pricing now reads like long-term floor they want, not a temporary thing they will walk back later. But its hard to tell yet whether that's Google specifically reading the room or the whole industry quietly resetting the cheap-inference baseline.

AgentMasterRace • today at 3:33 AM

Gemini 3.1 probation is literally the worst AI when I cycle from opus to got 5.5 then finally Gemini. It's actually insane that it's a frontier model. I rage at it more than my wife.

himata4113 • yesterday at 6:05 PM

Engineers at google have publically stated that the models are too big and are far from their potencial. Glad they're being proven right with every release.

They continue to focus on smaller models while openai and anthropic are increasing compute requirements for their SOTA models.

➕ show 6 replies

stared • yesterday at 9:37 PM

China: we don’t need to use US models, we can distill them ourself

Google: we don’t need Chinese to distill our models, we can do it ourself

paol_taja • yesterday at 11:58 PM

That pelican looks like it just sold a SaaS company and bought a bike because its therapist said it needed balance.

➕ show 1 reply

golfer • yesterday at 6:06 PM

Here's the benchmark scoreboard they published:

https://storage.googleapis.com/gweb-uniblog-publish-prod/ori...

Alifatisk • yesterday at 8:51 PM

The demo of the model in Antigravity automatically rename and categorize unstructured assets using vision was quite cool, it demodulates that the IDE sidepanel can be used for more than just coding. I wonder if the harness in Antigravity is based on Gemini cli or if they are completely different. Could Gemini cli do the same task? Or is the vision feature a Antigravity thing?

➕ show 1 reply

razodactyl • today at 1:21 AM

Aw. The listen to article widget doesn't work properly on mobile Safari and when using the options button, the popup appears below the "In this article" dropdown occluding it.

At least it read the authors of the article to me.

I wish we would push more towards testing code. Agentic AI excel when it's engaged.

sbinnee • yesterday at 10:13 PM

While I am excited, the price compared to gemini 3 flash preview which I used for the longest time is x3 more. Upon arrival of deepseek v4 flash, I am a happy user of deepseek. We will see how long that reign would last after I try this new gemini.

ElenaDaibunny • today at 3:55 AM

but latency in real GUI workflows with 50+ steps is still the elephant in the room for cloud-based agents

golfer • yesterday at 7:04 PM

Arena.ai:

> Gemini 3.5 Flash’s pricing shifts the Pareto frontier in Text. 8 models from GoogleDeepMind dominate the Text Arena Pareto curve where only 4 labs are represented for top performance in their price tiers.

https://x.com/arena/status/2056793180998361233

➕ show 1 reply

merb • yesterday at 6:49 PM

Stil no new processor version for document ai https://docs.cloud.google.com/document-ai/docs/release-notes that is so weird. (Customer extractor)

It’s not possible to uptrain on preview releases and it did not get that much love for a while.

aliljet • yesterday at 6:20 PM

Is there a good benchmark tracking hallucinations? The models are all incredibly good now, even the open ones, and my hope is that the rate of hallucinations is something that's falling off in concert with larger and larger context lengths.

➕ show 8 replies

sigbeta • today at 2:22 AM

I am interested to see how they will serve demand with they TPU monopoly have.

jonnyasmar • yesterday at 11:29 PM

The $1.50/$9.00 pricing is a meaningful shift if you've been running Gemini as the "fast iteration" half of a multi-model coding workflow. I've had Claude Code, Codex, and Gemini CLI running side by side and the working split was "Gemini for quick scaffolding and exploration where the cost of being wrong is low, Sonnet for correctness-critical stuff." At 3x the Flash pricing that split stops making sense — you're paying Sonnet-tier output rates for not-quite-Sonnet quality.

For pure chat that's annoying but tolerable. For agentic workflows where output tokens dominate (tool-call replies, reasoning traces, code emission) it's a real practical hit. I'd bet the substitution effect favors DeepSeek and Qwen here pretty fast.

➕ show 1 reply

eis • yesterday at 6:26 PM

3.5 Flash was more expensive than 3.1 Pro to run the Artifical Analysis test suite. $1551 for 3.5 Flash [0] vs $892 for 3.1 Pro [1]. That's 74% more cost while ranking lower. It's 2.5x as fast but I don't think the bang for the buck is there anymore like it was with 3.0 Flash. I'm a bit bummed out to be honest.

I did not expect such a huge (3x) price increase from 3.0 Flash and I bet many people will not just blindly upgrade as the value proposition is widely different.

One interesting point to note is that Google marked the model as Stable in contrast to nearly everything else being perpetually set as Preview.

[0] https://artificialanalysis.ai/models/gemini-3-5-flash [1] https://artificialanalysis.ai/models/gemini-3-1-pro-preview

➕ show 5 replies

mchusma • today at 1:29 AM

I have thought about this and I think overall, this was a disappointing release from Google. I'm not sure the sentiment, but this feels like a miss.

What they did do in the keynote was spend a lot of time talking about their distribution advantage, and how they can own the consumer in search. But not a lot that will benefit partners or developers.

Basically, they released something broadly competitive with Sonnet 4.6, a new Omni model that seems interesting but unclear yet. They have completely ceded the frontier to OpenAI / Anthropic, and are saying "look for pro next month".

The best release since nano banana pro from Google has been Gemma.

bredren • yesterday at 8:15 PM

Can anyone who has extensive, recent, experience with Claude code and Codex contextualize the current Gemini CLI product experience?

➕ show 3 replies

mixtureoftakes • yesterday at 6:11 PM

benchmarks look REALLY good, the price hike is big but it also beats sonnet 4.6 in every discipline?

➕ show 1 reply

paperwork360 • yesterday at 8:04 PM

Google also updated Antigravity. version 2.0 is more for conversation with agent. The previous VS Code like IDE was much better.

➕ show 2 replies

pqdbr • yesterday at 9:59 PM

In my tests, in real production use cases, it's a hard pass.

It's actually 10-15% slower and also more expensive than Gemini 3.1 Pro, because it thinks more than 2.5x Gemini 3.1 Pro.

So that thinking verbosity nullifies the speed and cost gains.

AND the quality is worse than 3.1 Pro for our use cases, making mistakes Pro doesn't make.

ErystelaThevale • today at 12:15 AM

Gemini has been too agreeable to be useful for actual debate. Curious if 3.5 changes that, or just the benchmarks

MASNeo • yesterday at 7:29 PM

Well, available for Gemini means these days that half the time they are “Receiving a lot of requests right now.” and so sorry they couldn’t complete the task. Luckily the model supports long time horizons because that’s what’s needed. /me likes Gemini a lot just wishing Google would add the compute!

➕ show 1 reply

x3cca • yesterday at 8:00 PM

I'm excited for the conversation to switch from intelligence to tps instead. I care much less about what hard thought experiments models can one shot and much more how responsive my plain text interface for doing things is.

mackross • yesterday at 7:36 PM

The antigravity teamwork-preview doesn't work for me -- upgraded to ultra, installed antigravity 2, ran teamwork-preview, keeps failing: "You have exhausted your capacity on this model. Your quota will reset after 0s."

noelsusman • yesterday at 6:46 PM

The Artificial Analysis benchmark results are pretty underwhelming. Roughly the same "intelligence" as MiMo-V2.5-Pro for over 3x the cost. We'll have to see how that translates to actual usage but it's not a great sign.

➕ show 1 reply

amelius • yesterday at 9:06 PM

Gemini, please block all ads in my search engine.

swe_dima • yesterday at 6:03 PM

Flash family but costs like a Pro. $9 vs $12 for output.

alexdns • yesterday at 5:59 PM

Its Gemini 3.5 Flash

➕ show 1 reply

alt Hacker News

Gemini 3.5 Flash

Comments

🔗 View 36 more comments