logoalt Hacker News

janalsncmyesterday at 8:40 PM1 replyview on HN

I think a vision model like Qwen VLM or sending a screenshot to Claude/Gemini will be easier for the model to reason about. Pictures encode spatial info much more naturally than json.


Replies

aedyesterday at 8:46 PM

There's an endpoint for that!

https://api.hallucinatingsplines.com/reference#tag/cities/GE...

You can also pull the map tiles as an array: https://api.hallucinatingsplines.com/reference#tag/cities/GE...

Would be interesting to two agents with the same instructions do a "face off" but each only has access to one type of map.