I was tracking down bug #340451 and it turns out to be more pervasive than I thought:
https://bugs.launchpad.net/inkscape/+bug/340451
The problem is that things are casting 16-bit Unicode values to bytes by chopping all but the last 8 bits.
I'm going to be working on fixing this, but it is such a fundamental issue that I'm surprised we're not seeing more issues. Also I'll need to test a few more things and make sure I don't break more than I fix.
In the Java world they address this by having a very explicit difference between characters and bytes. Bytes are 'byte' and handled by "OutputStream"s whereas characters are 'character' and are handled by "Writer"s.
So I'll just get some tests fixed and then clean up with proper transformations.
If anyone can take a look for things that might be a problem, related bugs, things I should test, etc. that would help.
Thanks.