What sort of fine tuning data was needed to allow the model to self-drive? One hour of video of someone driving, or extra labeling?
relevant note is that we finetuned by having the human also use arrow keys which keeps it in-distribution but also slower to collect
i actually drove the car (with arrow keys) around south park for around ~45 minutes as finetuning data, no extra labelling other than that. think the car line graph is super cool because you actually see the videegame prior working