note for all on character encodings

25 Oct 2004


      I've just got a minor note everyone who might deal with character 
encodings (probably most of us) need to be aware of.
We're basically dealing with three different string encodings (not just 
two).
1) The system/GTK+ encoding, which is UTF-8
2) The locale encoding
3) The filesystem encoding
Now, though the latter two could often be set to the same, they do not 
have to be.
I'm going through cleaning up some file operations, and in that it seems 
that there could be a few subtle bugs hiding here and there, depending 
on the understanding of encodings and differences.
For the most part, we want strings we deal with to be UTF-8 as much as 
possible. When strings come in or go out through calls that might 
generate or need something other than UTF-8, we should translat at that 
point. So we end up needing UTF-8 for all but some filename IO stuff.

Jon A. Cruz

tags (0)

participants (1)