GString or standard C++ library strings?
To mitigate overflow issues with fixed-size buffers in the Inkboard code for message transmission, I'm going to convert the Inkboard messaging code to use strings. However, I'm not really sure what string implementation would be preferred by the Inkscape team. As a lot of Inkboard and Inkscape code use glib facilities, the GString has that advantage, but since we're using C++...
I don't see this explicitly touched on in the Developer's Manual or the InkscapeJanitor wiki pages, so I'm guessing it hasn't been brought up before, but if it's listed somewhere, I apologize for the duplicate message.
-- David
On Wed, Jun 29, 2005 at 04:19:09AM -0500, David Yip wrote:
To mitigate overflow issues with fixed-size buffers in the Inkboard code for message transmission, I'm going to convert the Inkboard messaging code to use strings. However, I'm not really sure what string implementation would be preferred by the Inkscape team. As a lot of Inkboard and Inkscape code use glib facilities, the GString has that advantage, but since we're using C++...
I don't see this explicitly touched on in the Developer's Manual or the InkscapeJanitor wiki pages, so I'm guessing it hasn't been brought up before, but if it's listed somewhere, I apologize for the duplicate message.
Using std::string or Glib::ustring has the advantage of doing garbage collection on destruction, which is convenient for "automatic"/stack variables.
Using Glib::ustring has disadvantage that its operator[] runs in linear time in its argument, rather than constant time as one might expect. Similarly, I believe even its length() method takes linear time.
I'm inclined to say that Glib::ustring is misdesigned: that it should have provided iterator functions for convenient gunichar-at-a-time iteration for the rare cases when that's better than byte-at-a-time. (Note that gunichar-at-a-time isn't the same as character-at-a-time anyway, once you accept the unicode spec's interpretation of combining diacritics.)
That said, using Glib::ustring does at least give the advantage of runtime-checked documentation that the string is valid utf8.
GString can be useful for interfacing with non-gtkmm gtk stuff.
pjrm.
On Wed, 2005-06-29 at 21:13 +1000, Peter Moulder wrote:
On Wed, Jun 29, 2005 at 04:19:09AM -0500, David Yip wrote:
To mitigate overflow issues with fixed-size buffers in the Inkboard code for message transmission, I'm going to convert the Inkboard messaging code to use strings. However, I'm not really sure what string implementation would be preferred by the Inkscape team. As a lot of Inkboard and Inkscape code use glib facilities, the GString has that advantage, but since we're using C++...
I don't see this explicitly touched on in the Developer's Manual or the InkscapeJanitor wiki pages, so I'm guessing it hasn't been brought up before, but if it's listed somewhere, I apologize for the duplicate message.
Using std::string or Glib::ustring has the advantage of doing garbage collection on destruction, which is convenient for "automatic"/stack variables.
Using Glib::ustring has disadvantage that its operator[] runs in linear time in its argument, rather than constant time as one might expect. Similarly, I believe even its length() method takes linear time.
That's a problem with utf8, not ustring. GString doesn't offer any linear way to iterate through utf8 characters in linear time either.
I'm inclined to say that Glib::ustring is misdesigned: that it should have provided iterator functions for convenient gunichar-at-a-time iteration for the rare cases when that's better than byte-at-a-time.
Feel free to submit a patch, though I'm not sure when that's useful. You can also convert to std::string if that's useful.
(Note that gunichar-at-a-time isn't the same as character-at-a-time anyway, once you accept the unicode spec's interpretation of combining diacritics.)
That said, using Glib::ustring does at least give the advantage of runtime-checked documentation that the string is valid utf8.
GString can be useful for interfacing with non-gtkmm gtk stuff.
Is there any part of the GTK+ API that takes/provides a GString? Glib::ustring::c_str() is usually all you need.
On Wed, Jun 29, 2005 at 01:53:35PM +0200, Murray Cumming wrote:
Using Glib::ustring has disadvantage that its operator[] runs in linear time in its argument, rather than constant time as one might expect. Similarly, I believe even its length() method takes linear time.
That's a problem with utf8, not ustring. GString doesn't offer any linear way to iterate through utf8 characters in linear time either.
Just to clarify: the main issue is that programmers don't expect operator[] to run in time proportional to its argument. What is operator[] useful for? When would one ever want to know what the 25th gunichar of a string is? For C++, iterators are the more appropriate access method. What data structures in the C++ standard library provide operator[] that run in time proportional to its argument? Cf std::list documentation in gnu libstdc++-6-4.0:
# Unlike std::vector and std::deque, random-access iterators are not # provided, so subscripting ( [] ) access is not allowed.
or the footnote in the [last publicly-visible draft of the] C++ standard:
# ... the operator[] and at member functions ...[2] # # [2:] These member functions are only provided by containers whose # iterators are random access iterators.
Thus, either operator[] should have been omitted from Glib::ustring, or it should index into bytes rather than gunichar's, or Glib::ustring should have chosen a representation that allows random access.
Feel free to submit a patch,
Yes, you're right, inkscape-devel isn't the right place for this; please excuse the rant.
pjrm.
On Jun 29, 2005, at 2:19 AM, David Yip wrote:
To mitigate overflow issues with fixed-size buffers in the Inkboard code for message transmission, I'm going to convert the Inkboard messaging code to use strings. However, I'm not really sure what string implementation would be preferred by the Inkscape team. As a lot of Inkboard and Inkscape code use glib facilities, the GString has that advantage, but since we're using C++...
One of the main issues is that std::string is not UTF-8 safe.
Glib::ustring, on the other hand, is.
Quoting David Yip <yipdw@...635...>:
To mitigate overflow issues with fixed-size buffers in the Inkboard code for message transmission, I'm going to convert the Inkboard messaging code to use strings. However, I'm not really sure what string implementation would be preferred by the Inkscape team. As a lot of Inkboard and Inkscape code use glib facilities, the GString has that advantage, but since we're using C++...
Glib::ustring is the preferred representation for mutable UTF-8 strings.
-mental
participants (5)
-
unknown@example.com
-
David Yip
-
Jon A. Cruz
-
Murray Cumming
-
Peter Moulder