logoalt Hacker News

t-3today at 11:17 AM1 replyview on HN

Yeah, a parser has no need to understand what a string or glyph is, let alone ASCII or UTF-8. The point is to take a stream of arbitrary data and process it into something that can be reasoned about. Unless you know your input stream is regular in some way, processing it at the finest level of granularity (usually bytes) is probably the only thing to do.


Replies

paulddrapertoday at 5:08 PM

Well it depends whether you parsing binary (byte stream) or text (character stream).

In practice, lots of text formats (JSON, XML) embed or hint the character encoding in the format.