Configure a subagent in your coding harness to spin up a new sub-session with any vision model for t...

0xbadcafebee • today at 1:44 PM • 2 replies • view on HN

Configure a subagent in your coding harness to spin up a new sub-session with any vision model for those tasks and feed the result back to the main model. No need for "one model that does everything"

Replies

ricardobeat • today at 8:00 PM

That doesn’t work well in a lot of scenarios. The text LLM doesn’t know what to look for in an image before it sees a description, you might need multiple rounds of back and forth.

➕ show 1 reply

WASDx • today at 4:19 PM

Are you suggesting it should summarize the image in text or generate it in HTML or something else?

alt Hacker News

Replies