"GLM-5.2 hit a problem here, because it can't read images. It isn't multimodal. So in...

js4ever • today at 10:58 AM • 1 reply • view on HN

"GLM-5.2 hit a problem here, because it can't read images. It isn't multimodal. So instead of looking at a screenshot, it fell back on a hacky workaround: it wrote scripts to read the raw pixel data and check whether the colors came out roughly as expected."

A better way would be to use https://github.com/openbmb/MiniCPM-V

Replies

twobitshifter • today at 11:10 AM

Right, just give the text llm access to a vision specific agent and that problem can be solved. Or if you really want let it even call Opus with an image - seems like you’d still save money

alt Hacker News

Replies