I think a vision model like Qwen VLM or sending a screenshot to Claude/Gemini will be easier fo...

janalsncm • yesterday at 8:40 PM • 1 reply • view on HN

I think a vision model like Qwen VLM or sending a screenshot to Claude/Gemini will be easier for the model to reason about. Pictures encode spatial info much more naturally than json.

Replies

aed • yesterday at 8:46 PM

There's an endpoint for that!

https://api.hallucinatingsplines.com/reference#tag/cities/GE...

You can also pull the map tiles as an array: https://api.hallucinatingsplines.com/reference#tag/cities/GE...

Would be interesting to two agents with the same instructions do a "face off" but each only has access to one type of map.

alt Hacker News

Replies