2013/9/12 Martin Owens <doctormo@...400...>:
On Thu, 2013-09-12 at 17:30 +0200, Krzysztof Kosiński wrote:
This patch will break all Adobe SVG files that contain the string "SYSTEM" e.g. in text objects or IDs.
Yep. Are you saying that's an unacceptable trade-off? Because it is a hypothetically tiny subset of documents.
This subset is not tiny by any reasonable standard. You don't need to try very hard to create a document that contains the word "SYSTEM", in fact you can trivially create them in Inkscape without even editing the XML, and breaking all such documents is unacceptable. BTW, it would also break all embedded images which contain the string "SYSTEM" in their base64 encoding, as well as any paths that contain the string "SYSTEM".
This problem could be solved correctly by writing a SAX parser; it was discussed some time ago on the list. However, that involves writing a nontrivial amount of code. A better quick fix is to use g_regex_replace to remove only system entity declarations, which would break only very exotic documents:
GRegex *entity_regex = g_regex_new("<!ENTITY\\s+[^>\s]+\s+SYSTEM\s+"[^>"]+"\s*>", G_REGEX_CASELESS, 0, NULL); gchar *fixed_buffer = g_regex_replace(entity_regex, buffer, len, 0, "", 0, NULL); g_regex_unref(entity_regex);
The above code will break documents that contain a system entity declaration enclosed in CDATA (not XML-encoded) as the content of a text element, but since Inkscape never produces this kind of XML, I guess we can live with that for now.
Regards, Krzysztof