logoalt Hacker News

volkercraigyesterday at 4:01 PM1 replyview on HN

I don't understand. Surely training an LSTM with sensor input is more practical and reasonable way than trying to get a text generator to speak commands to a drone.


Replies

encruxyesterday at 4:28 PM

Very much depends on what you want to do.

The fact that a language model can „reason“ (in the LLM-slang meaning of the term) about 3D space is an interesting property.

If you give a text description of a scene and ask a robot to perform a peg in hole task, modern models are able to solve them fairly easily based on movement primitives. I implemented this on a UR robot arm back in 2023

The next logical step is, instead of having the model output text (code representing movement primitives), outputting tokens in action space. This is what models like pi0 are doing.

show 1 reply