
On Mar 3, 2014, at 2:37 PM, Martin Owens wrote:
On Mon, 2014-03-03 at 22:56 +0100, Johan Engelen wrote:
Perhaps you can create a helper/string.h file with all the string functions you need?
If I get more than 2, I'll do that.
(just thought of it now, but might be complete nonsense:) shouldn't the path stuff be implemented with glib::ustring, for paths that contain, say, Japanese characters?
KK believes different:
On Fri, 2014-02-07 at 15:40 +0100, Krzysztof Kosiński wrote:
Another thing to note is that std::string should only be used for paths. For UTF-8 strings, we should use Glib::ustring, which has a character-based index operator instead of a byte-based one.
On Fri, 2014-01-24 at 22:16 +0100, Krzysztof Kosiński wrote:
Eventually, all paths should be stored in std::string and all XML content, user-facing strings, and so on in Glib::ustring, but these tricks should help you get by in the meantime.
Jon Cruz: What say you, std::string or std::ustring for paths?
Some of it depends on where in the program the strings are going to be used.
In general we want to keep the core of our software to purely Unicode. In our case with GTK+ and GLib that means UTF-8.
User data *may* come in and out of certain points in the locale encoding. For those cases we need to convert appropriately.
Then file IO will be in the filesystem encoding. There too we probably want to convert fairly soon to UTF-8 data, and as we go out to the file system we'll have to convert back. Certain systems may have paths that don't convert cleanly to UTF-8. However those files and/or directories won't show up properly in file browsers, GNOME desktop, etc. so it might be fine to punt on those.
For more advanced use, we might add a system that maps untranslatable file-system paths to a corresponding UI string. And/or we could create a complex data type that included the filesystem string and the 'user/UI' string. These might be extreme measures, though.
Another big gotcha is that URI's are tricky, and the standard is very limited in character set. This then leads to most of the URI support in GTK+/GLib being unusable for our needs. Technically I think we actually might need IRI's.
So... the next step is probably to make a decision on where your path-type operations will fit: are they low-level in filesystem encoding random bytes, or are they slightly higher in UTF-8 data?
If you do decide to operate at the lower-level, then you have to take care to only ever walk strings from the very beginning... never search for path separators but walk for them instead, etc. UTF-8 strings, such as in GLib::ustring, can be searched for '/', '.', etc. since UTF-8 ensures we will never have problems with lead-byte/trail-byte ASCII mismatches and such.
And finally... a completely different approach might be used by leveraging Boost's filesystem lib.