Article says this misses important details, eg data that might be in the image.

efavdb • today at 1:16 AM • 1 reply • view on HN

breadislove • today at 1:26 AM

very bad take. with most modern multomodal models you get way better performance then going to text first

➕ show 1 reply

alt Hacker News