logoalt Hacker News

fortytoday at 6:46 AM2 repliesview on HN

Moving to something else that JSON for this kind of thing is reasonable given the issues with parsing JSON which can cause 2 implementation to interpret it in 2 different ways.

https://seriot.ch/security/parsing_json.html


Replies

throwaway2037today at 11:24 AM

This is a great blog and an incredible piece of research. Re-reading again today, makes me think of Emil Stenström's recently effort to use an LLM to write an HTML5 parser in pure Python [1] using the official HTML5 spec and their test cases. [2] Later, Simon Willison used an LLM to convert the pure Python source to JavaScript. [3] It seems reasonable to ask an LLM to write a "perfect" JSON parser given the RFC spec and massive test pack from seriot.ch. Regarding the "minefield" of JSON parsing, I used to lean on Google's Gson (Java) a lot in my early days. I thought Jackson FasterXML was "too complex". Later, I realised the mind-boggling number of configuration options was weirdly more sustainable (but more complex!), because I could carefully control each JSON parser/generator edge case.

[1] https://github.com/EmilStenstrom/justhtml

[2] https://simonwillison.net/2025/Dec/14/justhtml/

[3] https://simonwillison.net/2025/Dec/15/porting-justhtml/

camgunztoday at 7:44 AM

CBOR has other ways it's unsuitable; the spec has a whole section about it: https://datatracker.ietf.org/doc/html/rfc9052#name-cbor-enco...

show 1 reply