logoalt Hacker News

nee1rlast Monday at 5:10 PM4 repliesview on HN

Hey guys! I’m Neel, been holed up in our south park office for the past year working on model training. excited to share our research!

This is a preview of a very different type of computer use model—we train on the internet. Specifically we have 11 million hours of computer video stored on our storage cluster (previously shared https://news.ycombinator.com/item?id=45438496 !) and the model can work in 30 FPS. Since we match the fundamental form factor of computer-use, we can get our model to do CAD, browse websites, and even drive a car using arrow keys. I’m super excited to see what our model can do as we scale more, it's a fun frontier to work on (not language models :) ).

The team and I will be online responding to the comments, so drop any questions.


Replies

ilakshtoday at 12:25 AM

How do I access this? Any HF or API coming?

Any benchmark comparisons to Fara-7B or Sonnet 4.6, Qwen 3.5 etc.?

dangoodmanUTyesterday at 11:15 PM

11 million hours of data is a lot, did you have to synthesize it at all, or was it purely collected?

show 1 reply
arkmmtoday at 12:06 AM

Get ready for the acquisition offers.

AndrewKemendoyesterday at 11:14 PM

This looks like a really promising approach

In particular the Forward rollout module is very important. It aligns your (effectively) world model with what it expects from the world, and keeping those in sync I think gives this the power it needs to be able to generate the state action pairs to continuously train semi supervised