Solving bug #166371 - SAX parser
Hello
After fixing the XXE vulnerability, Inkscape can no longer read SVGs that use entities in URLs. Notable cases include SVGs saved from Adobe Illustrator. There does not seem to be a way to replace these entities without reintroducing the XXE vulnerability. https://bugs.launchpad.net/inkscape/+bug/166371
I just wanted to point out that the best way to solve this would be to use libxml2's SAX parser or the newer XmlTextReader parser to directly construct the tree of Inkscape::XML::Node's, instead of postprocessing the XML tree obtained from the DOM-like parser. With this approach, we could intelligently substitute only string entities and ignore entities that point to URLs or external files. An additional bonus would be a reduction in memory use and load time, since the libxml2 DOM tree would not be created. http://www.xmlsoft.org/xmlreader.html
Regards, Krzysztof
On Wed, 2013-08-21 at 03:48 +0200, Krzysztof Kosiński wrote:
use libxml2's SAX parser
This sounds like the right direction to take considering the bug and the pickle we're in between the vuln and the support for xml made in other applications.
I only have three questions:
1. What difference (if any) in compile and/or run-time requirements 2. How messy would a transition be from the dom parser to libxml2? 3. How long would it take to do the transition?
Martin,
On Aug 20, 2013, at 7:02 PM, Martin Owens wrote:
On Wed, 2013-08-21 at 03:48 +0200, Krzysztof Kosiński wrote:
use libxml2's SAX parser
This sounds like the right direction to take considering the bug and the pickle we're in between the vuln and the support for xml made in other applications.
I only have three questions:
- What difference (if any) in compile and/or run-time requirements
- How messy would a transition be from the dom parser to libxml2?
- How long would it take to do the transition?
There might be some lingering code that expects libxml's dom nodes to be present... but that is something we should look into anyway.
In general if switching parsers would be a good thing, then the conversion can be done without *too* much work. Or more specifically I can probably address that. I've used SAX with Java, libxml and others for some time now, and can get most of what we need covered. So...
1. Compile time and run-time should be the same. Both API's have been part of core libxml2.
2. Depends on whether or not our code depends on the libxml DOM tree to remain present after initial parsing.
3. If the existing DOM parser is just used for throw-away input, then I can probably get it done in a few days... so call it three weeks. :-)
Of course, if we do have the DOM tree left over, we most likely want to purge that anyway. In the long run I definitely see shifting to a SAX parser as a good step. It will bring in other benefits also, such as making it easier to support multiple versions of SVG, broken SVG, etc.
2013/8/21 Jon Cruz <jon@...18...>:
- Depends on whether or not our code depends on the libxml DOM tree to remain present after initial parsing.
I looked into this some time ago. There is no dependency on the libxml2 DOM tree once the document is ready, it is just recursively converted to Inkscape::XML::Node's and then discarded.
Regards, Krzysztof
participants (3)
-
Jon Cruz
-
Krzysztof Kosiński
-
Martin Owens