logoalt Hacker News

marcd35yesterday at 6:01 PM6 repliesview on HN

i'm no expert, and i actually asked google gemini a similar question yesterday - "how much more energy is consumed by running every query through Gemini AI versus traditional search?" turns out that the AI result is actually on par, if not more efficient (power wise) than traditional search. I think it said its the equivalent power of watching 5 seconds of TV per search.

I also asked perplexity to give a report of the most notable ARXIV papers. This one was at the top of the list -

"The most consequential intellectual development on arXiv is Sara Hooker's "On the Slow Death of Scaling," which systematically dismantles the decade-long consensus that computational scale drives progress. Hooker demonstrates that smaller models—Llama-3 8B and Aya 23 8B—now routinely outperform models with orders of magnitude more parameters, such as Falcon 180B and BLOOM 176B. This inversion suggests that the future of AI development will be determined not by raw compute, but by algorithmic innovations: instruction finetuning, model distillation, chain-of-thought reasoning, preference training, and retrieval-augmented generation. The implications are profound—progress is no longer the exclusive domain of well-capitalized labs, and academia can meaningfully compete again."


Replies

roughlyyesterday at 7:28 PM

I’m… deeply suspicious of Gemini’s ability to make that assessment.

I do broadly agree that smaller, better tuned models are likely to be the future, if only because the economics of the large models seem somewhat suspect right now, and also the ability to run models on cheaper hardware’s likely to expand their usability and the use cases they can profitably address.

827ayesterday at 7:44 PM

Conceptually, the training process is like building a massive and highly compressed index of all known results. You can't outright ignore the power usage to build this index, but at the very least once you have it, in theory traversing it could be more efficient than the competing indexes that power google search. Its a data structure that's perfectly tailored to semantic processing.

Though, once the LLM has to engage a hypothetical "google search" or "web search" tool to supplement its own internal knowledge; I think the efficiency obviously goes out the window. I suspect that Google is doing this every time you engage with Gemini on Search AI Mode.

ainchtoday at 12:42 AM

It's a good paper by Hooker but that specific comparison is shoddy. Llama and Aya were both trained by significantly more competent labs on different datasets to Falcon and BLOOM. The takeaway there is "it doesn't matter if you have loads of parameters if you don't know what you're doing."

If we compare apples-to-apples, eg. across Claude models, the larger Opus still happily outperforms it's smaller counterparts.

lelandbateyyesterday at 8:32 PM

Some external context on those approximate claims:

- Run a 1500W USA microwave for 10 seconds: 15,000 joules

- Llama 3.1 405B text generation prompts: On average 6,706 joules total, for each response

- Stable Diffusion 3 Medium generating a 1024 x 1024 pixel image w/ 50 diffusion steps: about 4,402 joules

[1] - MIT Technology Review, 2025-05-20 https://www.technologyreview.com/2025/05/20/1116327/ai-energ...

show 1 reply
fatata123today at 12:16 PM

[dead]