Look at the table of supported modalities. It can take in input of image/video/text/a...

derac • today at 2:36 PM • 1 reply • view on HN

Look at the table of supported modalities. It can take in input of image/video/text/actions and output image/video/text/actions.

causal • today at 3:22 PM

That just raises more questions. What kind "observation or action" image does input generate? What is an action output if it's not text?

alt Hacker News