Re: [Inkscape-devel] patch: merge pdf import via poppler-cairo into native importer
On 17-Jun-2014 11:55, Josh Andler wrote:
The handful of random files I had around here as well as a couple I found in the tracker are importing the glyphs as outlines/symbols... here are a couple examples: https://bugs.launchpad.net/inkscape/+bug/429709/+attachment/718312/+files/%5... from https://bugs.launchpad.net/inkscape/+bug/429709
Ah, I see now what you are talking about. After poppler import defs is full of symbols like this:
<symbol overflow="visible" id="glyph12-48"> <path style="stroke:none;" d="M 2.765625 (PATH EDITED OUT) Z " id="path4079" /> </symbol>
and they are placed in the drawing with
<g style="fill:rgb(0%,0%,0%);fill-opacity:1;" id="g10226"> <use xlink:href="#glyph12-48" x="183.553346" y="712.58976" id="use10228" /> </g>
The original PDF did have selectable text when opened in a PDF viewer, but after poppler import into Inkscape all of the text has been replaced with graphic draws. Looks like poppler import is essentially treating everything as a draw operation, and it has no concept of <text> or <tspan>. The resulting SVG representation would not be easy to invert since the glyph IDs provide no clue as to the original font or unicode value. The part of the id before the dash is probably correlated with fonts, as in glyph1 might be "Arial", glyph2 "Courier", and so forth, but the part after is just numbered sequentially for each "glyph*" variant.
Whoever is working on this might have a look at the code for "pdftotext", which comes with poppler. If that program uses the same import code it will show at least one way to retain text as text. (Or it might be completely separate from poppler's usual import and not be of any use at all. I have never looked at it.)
https://bugs.launchpad.net/inkscape/+bug/275655/+attachment/362795/+files/ma... from https://bugs.launchpad.net/inkscape/+bug/275655
That one is odd, the text doesn't import properly in any mode on the Inkscape versions I tried, and it is also lost going into LibreOffice draw.
Regards,
David Mathog mathog@...1176... Manager, Sequence Analysis Facility, Biology Division, Caltech
participants (1)
-
mathog