Learnings from 4 months of Image-Video VAE experiments

73 points • by schopra909 • last Tuesday at 6:59 PM • 12 comments • view on HN

Comments

It’s been a while but I’m pretty sure the original deepfake used VAE as well. Super powerful idea and architecture

As someone currently working on their own VAE, you reasoning for why you went with WAN 2.1 and your learnings for what you think you did wrong really resonated with me, specifically:

> Looking back, we should have just filtered out these samples from the dataset and moved on.

I hadn't even considered to look and see if poor data quality was resulting in an inability to recreate. This is a good gotchya to look out for. Appreciate the deep dive here!

schopra909 • last Tuesday at 7:00 PM

Hi HN, I’m one of the two authors of the post and the Linum v2 text-to-video model (https://news.ycombinator.com/item?id=46721488). We're releasing our Image-Video VAE (open weights) and a deep dive on how we built it. Happy to answer questions about the work!

➕ show 1 reply

greatgib • yesterday at 11:13 PM

Very nice well written article!

The kind that I like so much on HN. It tickle your mind but is still clear enough for an advanced beginner.

asaiacai • yesterday at 11:20 PM

its cool to see the iterative improvements to your model laid out, but for everything that workedm i imagine there were at least a million other things you also tried but didnt work out. whats your process of trying these different techniques/architectures? do you just wait for one experiment to finish and visually inspect the results everytime. seems hard since these take a while to train. how do you shorten the feedback loop in this space?

➕ show 1 reply

lastdong • yesterday at 10:13 PM

This seems like a great model to experiment fine tuning with original art, given it’s relatively small and with open license. Is that a fair assessment?

Thanks for the great write up and making it available to us all.

➕ show 1 reply

DonThomasitos • yesterday at 10:47 PM

Nice summary! I missed the mention of EQ-VAE when it comes to generation quality. Tiny trick, huge impact! Have you tried it?

➕ show 1 reply

pwillia7 • yesterday at 11:41 PM

This is very cool thanks for sharing

wangzhongwang • today at 2:06 AM

[dead]

fjejfhdh • yesterday at 10:03 PM

[flagged]

alt Hacker News

Learnings from 4 months of Image-Video VAE experiments

Comments