Em-dashes aside, I favor "one model that can do everything" in principle because scaling l...

ACCount37 • today at 1:18 PM • 0 replies • view on HN

Em-dashes aside, I favor "one model that can do everything" in principle because scaling laws and distillation exist, and in practice because "one model that you can point at any problem" is a massive operational advantage.

If you can get 5 specialist models that can use the same robot body, you can also get 1 generalist model with more capacity and fold the specialists into it. If you have the in-house training that made those specialists, apply them to the generalist instead, the way we give general purpose AIs coding-specific training. If you don't, take the specialists as is and distill from them.

If you do it right, transfer learning might even give you a model that generalizes better and beats the specialists at their own game. Because your "special" tasks have partial subtask overlap that you got stronger training for, and contributed to diversity of environments. Robotics AI is training data starved as a rule.

Same kind of lesson we learned with LLM specialists - invest into a specialist model and watch the next gen generalists with better data and training crush it.

alt Hacker News