Gemini Omni

292 points • by meetpateltech • yesterday at 5:46 PM • 123 comments • view on HN

Comments

In my day job I program rigid body behaviour in real time amongst other simulations. I think rigid body contact is hard to learn as it is inherently discontinuous.. something you discover when trying to code a solver.

As such I always use this prompt as a test: "A video of a jenga brick tower falling over as a brick is removed. The physics of each brick must be realistic."

It gave me a video of where bricks suddenly disapper or morph into others[1]. The linked video is after 2-3 iterations of me insisting on realistic physics. If you are just glancing at this, you would believe it is realistic.

That said this is still very impressive and one more step towards .. IDK what. But I am a bit reasurred that at least my job won't be fully replaced with AI :)

[1] https://streamable.com/2em1r3

➕ show 7 replies

jackson_mile • today at 4:43 AM

To be honest, I think the performance of Gemini Omni Flash is still not as good as Seedance 2.0. You can try using both models on this platform. https://omnivideoai.co

torginus • yesterday at 9:42 PM

While at a cursory glance it looks as impressive as always, subtle spatial errors, and geometry that changes as it goes out of sight and comes back again hints at the fact that Google has still yet to solve the problem of deep spatial understanding.

Which considering just how pretty and detailed this whole thing looks, imo points at a fundamental issue at how these things are trained - it's as if there's no structure to its knowledge and training, like how an artist trained to draw would first try to understand simple 2d composition, then perspective, then light and shadow, mastering each concept and gradually building up a hierarchical understanding - it seems like its trying to learn everything at once.

I would rather see an AI model that I could give a floorplan of a building and it would generate an accurate flythrough on any path, even if it looked like butt.

Im not just talking out of my arse, I did work for a while in data science/engineering, and one of the big lessons people needed to be reminded of is to clean/downsample the data - a dataset consisting of a million samples could very well take 1000x as long to process as if we downsampled the whole thing to just a couple of thousand samples and we could learn the same conclusions with the fraction of expended time/effort.

I'm sure there's a similar logic in RL, that if you dump a trillion samples into the datacenter that consumes the same power as a city, what the model learns is what it could've learned with a much more curated training set and directed approaches.

adenta • yesterday at 6:51 PM

At first usage I'm not impressed. I've probably spent a couple grand on Seedance 2 to date, and I can't find anything google omni flash does better than Seedance from running a handful of samples through the system. You can find some of the videos I've made in my HN bio link.

➕ show 4 replies

enragedcacti • yesterday at 6:59 PM

> Prompt: Make it look like the weird shape of my hand hole super zooms and magnifies the ground it's looking at in sharper quality.

There's got to be a reason this is phrased so insanely, right?

➕ show 3 replies

randomthoughts5 • today at 4:35 AM

What's the end goal of video generation? It feels unnecessary. Text generation leads to AI that can replace workers. Video generation is bad and only for video content generation, like movie and tv show production?

raincole • yesterday at 7:07 PM

At the bottom there is a "Try in Youtube Shorts" button.

Oh god...

➕ show 2 replies

baq • yesterday at 8:11 PM

We could be solving fusion power and instead we’re generating videos of birds in space or something. The market is a harsh mistress sometimes.

kenjackson • yesterday at 7:20 PM

I'm an AI optimist. But AI video is probably the one thing that does depress me. Seeing that we can make anything visually, there's nothing that impresses me visually. I watch a video that two years ago I would've thought was really cool, and now my first thought is, "Yawn, is this AI?".

Video, more than anything else, is the place where I really care if something is AI or not. If I could get a TikTok that had no AI usage -- I'd be in. Which is weird for me, because I'm typically the guy who is all-in on AI.

➕ show 5 replies

kermatt • today at 1:29 AM

> I can create more videos as soon as your limit resets. Check your usage in Settings.

I have not used Gemini in a month.

meetpateltech • yesterday at 6:01 PM

blog post: https://blog.google/innovation-and-ai/models-and-research/ge...

model card: https://deepmind.google/models/model-cards/gemini-omni-flash...

franze • yesterday at 6:35 PM

> I can create more videos as soon as your limit resets. Check your usage in Settings

I did not create any videos yet.

Google, building great AI that nobody can try out.

But thx for the press release.

➕ show 2 replies

throw03172019 • yesterday at 6:42 PM

Browser crashes while scrolling because of all the auto playing videos. Please use IntersectionObserver to pause the video when not in display.

➕ show 5 replies

blt • yesterday at 11:52 PM

It's funny how they specifically use the phrase "output that follows real-world physics" to describe the marble rolling video. At the end of the zigzag track, the marble jumps up for no reason. In a couple of other places it speeds up with no apparent energy source. It's still an amazing result, but they could have picked a better example for this claim!

➕ show 1 reply

amelius • yesterday at 10:12 PM

What I'm hoping/waiting for is IMDB users creating alternative endings of movies.

It could make the comments section even more fun.

clapthewind • yesterday at 5:54 PM

I think Hollywood is in for a rough era. The disruption is happening at break neck speeds.

➕ show 5 replies

nl • today at 12:26 AM

Interestingly the `o` in GPT-o4 stood for Omni too (which I never realized until yesterday when reading random 3rd party documentation)

az226 • today at 12:52 AM

https://media1.giphy.com/media/SxB0S9MgHo4ZoNrDRk/200w.gif

dwa3592 • yesterday at 8:23 PM

Even though I don't have words to express how impressive this capability looks. I am genuinely scared at the harmful use cases of this.

dsign • yesterday at 6:47 PM

So it's really good, and we have reason to believe, never again, anything that happens in a video. Unless there's a super-product somewhere to authenticate footage?

➕ show 2 replies

vldszn • today at 3:04 AM

When I click the link, the website crashes on my iPhone 13 iOS Chrome lol

andrewstuart • yesterday at 6:58 PM

Who is creative enough to drive this in any meaningful way?

Certainly not me - you have to be a great artist /designer to even imagine what to do with it.

➕ show 1 reply

uejfiweun • yesterday at 9:07 PM

Does anyone else feel like Google is just always a dollar short and a day late here? Maybe not a dollar short, but it's like they've consistently been focused on the wrong thing. First they missed chatbots, now they're missing coding agents while they double down on chatbots and video gen (which OpenAI has already basically abandoned). Maybe this strategy is actually genius and I'm too stupid to grasp it.

➕ show 1 reply

King-Aaron • today at 1:26 AM

The people that think this output looks good are the same people that "don't get" art.

From a technical perspective, it's very impressive, no doubt. But from an artistic perspective I thought all of these examples on the site look bad.

alt Hacker News

Gemini Omni

Comments