Hi Martin, I'm sorry, I will revert commit 13045.
1. It contains a memleak realpath allocates new memory if the second argument is NULL. In the added code, this memory is never freed.
2. Breaks build on Windows (realpath(...) not found)
3. Please don't add functions that use C-strings unnecessarily. Simply rewrite parseDataUri to parseDataUri(const std::string &uri), and use C++'s string methods. ("#include <cstring.h>" is a huge red flag upon code review)
4. Please read this about "realpath" http://insanecoding.blogspot.ch/2007/11/pathmax-simply-isnt.html
regards, Johan
PS. (not related to commit) uri.cpp has a lot of "throw(BadURIException)", you can remove those as "throw(...)" don't add anything and will become obsolete in C++11.
On Thu, 2014-02-20 at 18:58 +0100, Johan Engelen wrote:
Hi Martin, I'm sorry, I will revert commit 13045.
Why be sorry, everything I've written recently has been reverted. While I might be annoyed, it's not with your code review, just with myself.
- It contains a memleak
realpath allocates new memory if the second argument is NULL. In the added code, this memory is never freed.
Realpath is a serious concern for me. I spent two hours researching path normalisation in c++ only to discover that it's impossible because glib is insufficient. :-/
If it doesn't start behaving I will fork inkscape and rewrite it a proper language like python. ;-)
If you happen to know about a C++ path normalisation function, do let me know as it will save me an awful lot of time.
Martin,
On 20-2-2014 20:12, Martin Owens wrote:
On Thu, 2014-02-20 at 18:58 +0100, Johan Engelen wrote:
Hi Martin, I'm sorry, I will revert commit 13045.
Why be sorry, everything I've written recently has been reverted. While I might be annoyed, it's not with your code review, just with myself.
- It contains a memleak
realpath allocates new memory if the second argument is NULL. In the added code, this memory is never freed.
Realpath is a serious concern for me. I spent two hours researching path normalisation in c++ only to discover that it's impossible because glib is insufficient. :-/
If it doesn't start behaving I will fork inkscape and rewrite it a proper language like python. ;-)
If you happen to know about a C++ path normalisation function, do let me know as it will save me an awful lot of time.
I only briefly glanced over the link in my earlier message. But it seems quite elaborate indeed. Perhaps you can use the Boost function suggested there. (we already have Boost as a dependency) If you still want to use realpath (I think it's called fullpath on Windows), at least make sure you clean up memory afterwards.
About memory clean-up: Your " preformed = ""; // g_free? " , indeed should not use g_free. What you are doing is having a local (!) pointer point to something else. You should not free the memory pointed to by the pointer passed to the function. So what you are doing is correct: you have the local (!) argument pointer point to another string instead. No memleak there.
cheers, Johan
On Thu, 2014-02-20 at 18:58 +0100, Johan Engelen wrote:
- Please don't add functions that use C-strings unnecessarily. Simply rewrite parseDataUri to parseDataUri(const std::string
&uri),
I have a question about this revert. The logic is being copied from image.cpp (where it uses c-strings) and I figured calling std::string( uri ) on a massive embedded image uri would be bad.
So, should I rewrite uri.cpp to only use std::string or accept the string copy that might result in lots of member use?
Martin,
On 20-2-2014 20:25, Martin Owens wrote:
On Thu, 2014-02-20 at 18:58 +0100, Johan Engelen wrote:
- Please don't add functions that use C-strings unnecessarily. Simply rewrite parseDataUri to parseDataUri(const std::string
&uri),
I have a question about this revert. The logic is being copied from image.cpp (where it uses c-strings) and I figured calling std::string( uri ) on a massive embedded image uri would be bad.
So, should I rewrite uri.cpp to only use std::string or accept the string copy that might result in lots of member use?
I don't fully understand your question, but eventually, pretty much everything of Inkscape should use std::string. So it's good not to add any C-string stuff, to make the refactoring easier. I agree that copying a massive data structure, only to discard it after does not make sense. I'm sorry, I didn't think enough about the function's use cases. Perhaps in this case, indeed it is better to use char*, if that is how the uri was stored in the first place. So you'd have to look at who will call parseDataUri and if they have the uri in a std::string or char*. (if so, you could add a brief comment of why you think char* is preferred above std::string)
cheers, Johan
On Thu, 2014-02-20 at 23:34 +0100, Johan Engelen wrote:
I don't fully understand your question, but eventually, pretty much everything of Inkscape should use std::string. So it's good not to add any C-string stuff, to make the refactoring easier.
I agree. Although it's good to make sure that these things are being done right.
I've put everything in a new branch and I'm going to be asking for a full review when it's ready. Might take a while to get image.cpp done though.
use char*, if that is how the uri was stored in the first place. So you'd have to look at who will call parseDataUri and if they have the uri in a std::string or char*
parseDataUri is a private function for uri.cpp to store data from a data uri instead of passing the uri to libxml's uri handling which doesn't handle data uris.
So in the case of normal urls, the data gets copied by libxml. In the case of data uris, the fewer copies the better. So, instead of using cstring, how to say:
const char *preformed = "data:..........."
if preformed[:5] == "data:": // this is a data uri and we're going to parse it
Like this daft conditional:?
if(p[0]=='d' && p[1] =='a' && p[2] == 't' && p[3] =='a' && p[4]==':'){ ... }
I was also going to store the const char offsets to the data and only feed back the conversion from base64 when needed as part of a stream pattern which would feed gdk image creation.
1. Is this safe, the char is owned by the same object that creates the URI object. Technically they should both be still in use? 2. Would this work?
Martin,
On 20-2-2014 23:55, Martin Owens wrote:
On Thu, 2014-02-20 at 23:34 +0100, Johan Engelen wrote:
I don't fully understand your question, but eventually, pretty much everything of Inkscape should use std::string. So it's good not to add any C-string stuff, to make the refactoring easier.
I agree. Although it's good to make sure that these things are being done right.
I've put everything in a new branch and I'm going to be asking for a full review when it's ready. Might take a while to get image.cpp done though.
use char*, if that is how the uri was stored in the first place. So you'd have to look at who will call parseDataUri and if they have the uri in a std::string or char*
parseDataUri is a private function for uri.cpp to store data from a data uri instead of passing the uri to libxml's uri handling which doesn't handle data uris.
So in the case of normal urls, the data gets copied by libxml. In the case of data uris, the fewer copies the better. So, instead of using cstring, how to say:
const char *preformed = "data:..........."
if preformed[:5] == "data:": // this is a data uri and we're going to parse it
Like this daft conditional:?
if(p[0]=='d' && p[1] =='a' && p[2] == 't' && p[3] =='a' && p[4]==':'){ ... }
I was also going to store the const char offsets to the data and only feed back the conversion from base64 when needed as part of a stream pattern which would feed gdk image creation.
- Is this safe, the char is owned by the same object that creates the
URI object. Technically they should both be still in use? 2. Would this work?
It's ok to use <cstring.h> if needed, it's just that it puts up a red flag for me is all. So you probably indeed will need strcmp or a sibling, that's fine I think. When you work with C-strings, you have to realize that you are working with raw data, and pointers to raw data. So memleaks and exception-safety issues become easy traps to fall in to. And so you have to make sure that somewhere, someone calls free/delete on your data. You simply have to be way more careful when programming with C-strings, and so it's best to keep the code confined and internal to your class.
-Johan
On Feb 20, 2014, at 2:34 PM, Johan Engelen wrote:
On 20-2-2014 20:25, Martin Owens wrote:
On Thu, 2014-02-20 at 18:58 +0100, Johan Engelen wrote:
- Please don't add functions that use C-strings unnecessarily. Simply rewrite parseDataUri to parseDataUri(const std::string
&uri),
I have a question about this revert. The logic is being copied from image.cpp (where it uses c-strings) and I figured calling std::string( uri ) on a massive embedded image uri would be bad.
So, should I rewrite uri.cpp to only use std::string or accept the string copy that might result in lots of member use?
I don't fully understand your question, but eventually, pretty much everything of Inkscape should use std::string. So it's good not to add any C-string stuff, to make the refactoring easier. I agree that copying a massive data structure, only to discard it after does not make sense. I'm sorry, I didn't think enough about the function's use cases. Perhaps in this case, indeed it is better to use char*, if that is how the uri was stored in the first place. So you'd have to look at who will call parseDataUri and if they have the uri in a std::string or char*. (if so, you could add a brief comment of why you think char* is preferred above std::string)
In general we should look to using either std::string or a std::vector<...> of some proper type. (The latter comes into play where code uses char * as "random array of bytes" instead of "null-terminated 8-bit string"). For using and passing a string we can use 'std::string const &inputData' to avoid duplication (passing a reference to a const string so that the string is not duplicated).
In this specific case we might benefit from something more. A data URI might be representing some binary blob that we could store in non-string form. Accessing a std::string as a c-string is fairly simple, but not vice-versa. I'll take a look to see what might be appropriate for the code in question. Also there are quite a few smart pointers in both Boost and C++ we can look into (but carefully).
Oh, and yes, creating a std::string from a large DataUI can be problematic.
2014-02-20 18:58 GMT+01:00 Johan Engelen <jbc.engelen@...2592...>:
- Please read this about "realpath"
http://insanecoding.blogspot.ch/2007/11/pathmax-simply-isnt.html
The POSIX.1-2001 version of realpath() is broken, but the POSIX.1-2008 version allows you to pass NULL as the second argument, which will allocate a new buffer with malloc() in the same manner as strdup(). This is supported on both Linux and OSX.
On Windows (MinGW), realpath() is not available at all and should be reimplemented with _wfullpath + g_utf16_to_utf8.
However, I'm not really sure why we would need to use realpath() or other means of path canonicalization at all. If the user wants to access an external document through a symlink, we should not interfere with that.
Regards, Krzysztof
On Fri, 2014-02-21 at 00:49 +0100, Krzysztof Kosiński wrote:
However, I'm not really sure why we would need to use realpath() or other means of path canonicalization at all. If the user wants to access an external document through a symlink, we should not interfere with that.
If you pass a non-normalised path to f_file_test then it fails. The code in question is a filtered output rather than a property which we would save. The local and non-normalised path would continue to be used in the toString function and only file existence checking and opening would use getFullPath.
Martin,
2014-02-21 2:29 GMT+01:00 Martin Owens <doctormo@...400...>:
On Fri, 2014-02-21 at 00:49 +0100, Krzysztof Kosiński wrote:
However, I'm not really sure why we would need to use realpath() or other means of path canonicalization at all. If the user wants to access an external document through a symlink, we should not interfere with that.
If you pass a non-normalised path to f_file_test then it fails. The code in question is a filtered output rather than a property which we would save. The local and non-normalised path would continue to be used in the toString function and only file existence checking and opening would use getFullPath.
I really doubt this is the case.
#include <glib.h> #include <stdio.h>
int main() { char const *path = "../junk/filetest.c"; if (g_file_test(path, G_FILE_TEST_EXISTS)) { printf("%s exists\n", path); } else { printf("%s does not exist\n", path); } return 0; }
When I save this as /home/tweenk/src/junk/filetest.c, compile and run it in that directory, it prints "../junk/filetest.c exists". (This is on Ubuntu 13.10.) Are there problems with g_file_test on Windows?
Regards, Krzysztof
On Fri, 2014-02-21 at 02:37 +0100, Krzysztof Kosiński wrote:
When I save this as /home/tweenk/src/junk/filetest.c, compile and run it in that directory, it prints "../junk/filetest.c exists". (This is on Ubuntu 13.10.) Are there problems with g_file_test on Windows?
Just tested with your example in /tmp
"/tmp/filetest.c" - Works as expected "/usr/../tmp/filetest.c" - Works as expected "/foo/filetest.c" - Fails as expected "/foo/../tmp/filetest.c" - Fails unexpectedly
It's checking the existence of every directory, even if it would be normalised out. That's a more subtle error than I first thought it was and probably not worth hassling over. I'll leave realpath out.
There's also a libxml variant xmlNormalizeURIPath "Applies the 5 normalization steps to a path string--that is, RFC 2396 Section 5.2, steps 6.c through 6.g. Normalization occurs directly on the string, no new allocation is done"
Martin,
participants (4)
-
Johan Engelen
-
Jon Cruz
-
Krzysztof Kosiński
-
Martin Owens