logoalt Hacker News

jeroenhdtoday at 2:22 PM1 replyview on HN

Go's docs put it like this: Path names are UTF-8-encoded, unrooted, slash-separated sequences of path elements, like “x/y/z”. If you operate on a path that's a non-UTF-8 string, then Go will do... something to make the string work with UTF-8 when passed back to standard file methods, but it likely won't end up operating on the same file.

Rust has OsStr to represent strings like paths, with a lossy/fallible conversion step instead.

Go's approach is fine for 99% of cases, and you're pretty screwed if your application falls for the 1% issue. Go has a lot of those decisions, often to simplify the standard library for most use cases most people usually run into (like their awful, lossy, incomplete conversion between Unix and Windows when it comes to permissions/read-only flags/etc.).


Replies

kbolinotoday at 3:33 PM

> Path names are UTF-8-encoded, unrooted, slash-separated sequences of path elements, like “x/y/z”

This is only for the "io/fs" package and its generic filesystem abstractions. The "os" package, which always operates on the real filesystem, doesn't actually specify how paths are encoded, nor does its associated helper package "path/filepath".

In practice, non-UTF-8 already wasn't an issue on Unix-like systems, where file paths are natively just byte sequences. You do need to be aware of this possibility to avoid mangling the paths yourself, though. The real problem was Windows, where paths are actually WTF-16, i.e. UTF-16 with unpaired surrogates. Go has addressed this issue by accepting WTF-8 paths since Go 1.21: https://github.com/golang/go/issues/32334#issuecomment-15500...