Re: [Inkscape-devel] URI is EVIL!!!

29 Sep 2009


      On Sep 29, 2009, at 11:46 AM, Krzysztof Kosiński wrote:
...
The problem I see with keeping everything in UTF-8 is that it's a
convention unsupported by Glib. Glib has functions to work with native
filenames and URIs, but not with native filenames converted to UTF-8.
We should just use URIs everywhere; this would add network
transparency via GIO, as we wouldn't be limited to local files.
No, you're missing the point.
The only sane way to craft an application is to have a consistent data  
approach internally. The alternative is to have each individual string  
also marked with its encoding and do multiple conversions on the fly  
as things are processed. (Look to both early Windows 95 APIs and older  
Perl versions for examples of this).
The simple way is to say "All data inside the program is Unicode".  
Volia! Problem solved.
Now Unicode data can be encoded in many ways (here we hit the  
important difference of "encoding" vs. "character set"). Unicode can  
be UTF-8, UTF-7, UTF-16BE, UTF-16LE, UTF-32, etc. Some data can be  
UCS-2, but that is *not* the same as UTF-16 and can burn people who  
don't realize it.
On MS Windows, they went with UTF-16 as the standard. They also  
defined wchar_t as 16-bit when everyone else in the world followed the  
standard's recommendation and implemented it as 32-bit. Java also uses  
UTF-16 as far as the programmer can tell. IBM's ICU also uses UTF-16.
On Linux, however, and with GTK+ the base encoding is UTF-8.  
Everything we do on the UI *must* be in UTF-8. Therefore it makes  
sense for Inkscape to use UTF-8 as its Unicode encoding of choice.
You mention filename encoding, but miss some issues. First and  
foremost is that as far as Inkscape is concerned each and every  
filename *must* be presentable in the UI anyway (MRU lists, titlebars,  
media tracker list, etc.). Therefore we have to be able to handle  
UTF-8 for those. There are also safe round-trip conversions for *most*  
of the user scenarios. Therefore it vastly simplifies our code to just  
keep internal data in a single consistent format - Unicode.
And conceptually URI's don't actually even support Unicode. What the  
authors of those API's you cite have done, though, is follow RFC-3987  
and implement their URI's *as* IRIs'. Thus we are 100% compatible with  
those API's you care so much about as long as we properly support IRIs.
...
By the way, there is some code that deals with versions of Windows
that do not have the wide versions of Win32 API functions (Windows 95,
Windows 98 and Windows ME) - this is totally superfluous since the
version of Glib we depend on does not work on such systems.
For years this code was not superfluous. It only became redundant  
later on. Thus it now can be safely dropped, but only as long as care  
is taken in cutting it out.
Also early on Inkscape implemented code that did not exist in Glib and  
was adopted by the Glib maintainers because of us.

Re: [Inkscape-devel] URI is EVIL!!!

Jon A. Cruz