Problem with our stream writing
I was tracking down bug #340451 and it turns out to be more pervasive than I thought:
https://bugs.launchpad.net/inkscape/+bug/340451
The problem is that things are casting 16-bit Unicode values to bytes by chopping all but the last 8 bits.
I'm going to be working on fixing this, but it is such a fundamental issue that I'm surprised we're not seeing more issues. Also I'll need to test a few more things and make sure I don't break more than I fix.
In the Java world they address this by having a very explicit difference between characters and bytes. Bytes are 'byte' and handled by "OutputStream"s whereas characters are 'character' and are handled by "Writer"s.
So I'll just get some tests fixed and then clean up with proper transformations.
If anyone can take a look for things that might be a problem, related bugs, things I should test, etc. that would help.
Thanks.
Jon A. Cruz wrote:
The problem is that things are casting 16-bit Unicode values to bytes by chopping all but the last 8 bits.
I'm going to be working on fixing this, but it is such a fundamental issue that I'm surprised we're not seeing more issues. Also I'll need to test a few more things and make sure I don't break more than I fix.
That's funny, I remember that the person who wrote the streams complained that GIO is broken because it doesn't handle UTF-8 natively, only byte streams.
The real in-depth solution would be to migrate to GIO streams everywhere, and code custom streams for things that we need. However, some of our devs don't have Glib 2.16 yet. Another issue is that the streams would have to be in plain GObject or use our custom bindings, because the existing C++ bindings don't expose the virtual functions (giomm devs thought that nobody would want to use them - actual answer from http://bugzilla.gnome.org/show_bug.cgi?id=572471).
A more short-term solution is of course fixing the existing streams.
Regards, Krzysztof Kosiński.
On Sat, 2009-03-14 at 12:48 -0700, Krzysztof Kosiński wrote:
That's funny, I remember that the person who wrote the streams complained that GIO is broken because it doesn't handle UTF-8 natively, only byte streams.
The real in-depth solution would be to migrate to GIO streams everywhere, and code custom streams for things that we need. However, some of our devs don't have Glib 2.16 yet. Another issue is that the streams would have to be in plain GObject or use our custom bindings, because the existing C++ bindings don't expose the virtual functions (giomm devs thought that nobody would want to use them - actual answer from http://bugzilla.gnome.org/show_bug.cgi?id=572471).
A more short-term solution is of course fixing the existing streams.
No, actually moving to GIO does nothing to address this specific problem. It does solve other issues, but nothing on character vs. byte issues.
Traditionally C and C++ programmers are very poor in handling character vs. byte issues, since among other things the "byte" type is named "char", etc. However Java programmers have to deal with this, as Java strings are all actually "character", which is the equivalent of 16-bit Unicode and any non-trivial file or network handling has to address the conversions properly.
The existing stream implementation there does have "OutputStream" and "Writer" type classes. We really don't want to throw the baby out with the bathwather and start over, especially since we are close to a good implementation. All we really need to do is properly handle stream vs. writer and all will be fixed. Even if we switched to something GIO based we would have to recreate all that and then do the same work on top of the new.
Using GIO could still make some issues easier to solve and understand for others, since it operates on raw byte streams and doesn't even pretend to do formatted IO. The distinction between characters and bytes is pretty clear. In essence we could adapt Writers to GIO streams. They would probably be faster since our current streams write only one per function call, while GIO can write any number.
My real motivation though is just to push some code out of Inkscape. If we want to integrate properly with Gnome we have to use GIO, so we might use it for our other stream needs as well.
Regards, Krzysztof Kosiński
On Sat, 2009-03-14 at 13:20 -0700, Krzysztof Kosiński wrote:
Using GIO could still make some issues easier to solve and understand for others, since it operates on raw byte streams and doesn't even pretend to do formatted IO. The distinction between characters and bytes is pretty clear. In essence we could adapt Writers to GIO streams. They would probably be faster since our current streams write only one per function call, while GIO can write any number.
Again, none of that is specific to GIO.
If we want, we can convert things to use more than one per function call. That's very easy to do, and something I was looking at addressing this weekend even.
My real motivation though is just to push some code out of Inkscape. If we want to integrate properly with Gnome we have to use GIO, so we might use it for our other stream needs as well.
Ah, here we have a main point.
A *KEY* issue is to *NOT* be dependent on GNOME. So where we can do things in a way that works nice with GNOME, that is fine. However if it will cause problems for native Windows or Native OS X (which is actually close to being usable) we don't want to do it.
Remember, OS X and MS Windows need to stay first-class citizens for Inkscape support.
If GTK+ and glib do not have it, then we need to have it in Inkscape.
Oh, and remember. I'm not saying we can't go to GIO. I'm just saying that switching to GIO would not help with these specific problems... in fact it would probably complicate things somewhat.
participants (2)
-
Jon A. Cruz
-
Krzysztof Kosiński