Imagine having virtually unlimited compute and programming resources, and silly little slop videos is the result.
Fabulous.
Do you find the video understanding work there also to be 'silly little slop', or did you only look at the gifs on the page and not read about the understanding work in a 3B model?
This is not ground-breaking by any means, but achieving this in a 3B model and sharing the approach + weights advances engineering and certainly more contribution that 'silly little slop videos' imo.
Not that surprising if the reason you have virtually unlimited compute and programming resources is that you work at the leading short form video app company. They could also have chosen nót to open source it.