It's not uncommon when you want variable length encodings to write the number of extension bytes used in unary encoding
https://en.wikipedia.org/wiki/Unary_numeral_system
and also use whatever bits are left over encoding the length (which could be in 8 bit blocks so you write 1111/1111 10xx/xxxx to code 8 extension bytes) to encode the number. This is covered in this CS classic
https://archive.org/details/managinggigabyte0000witt
together with other methods that let you compress a text + a full text index for the text into less room than text and not even have to use a stopword list. As you say, UTF-8 does something similar in spirit but ASCII compatible and capable of fast synchronization if data is corrupted or truncated.