That's assuming the text is not corrupted or maliciously modified. There were (are) _numerous_ vulnerabilities due to parsing/escaping of invalid UTF-8 sequences.
Quick googling (not all of them are on-topic tho):
https://www.rapid7.com/blog/post/2025/02/13/cve-2025-1094-po...
I was just wondering a similar thing: If 10 implies start of character, doesn't that require 10 to never occur inside the other bits of a character?