Hello David,
You're right, it involves using both split lines and remove manual kerns in that order. If you have for instance two columns of text, the import will collect it into one <text> object with several <tspan> and place every single character using a dx and dy position.
To keep the whole thing in place, you need to split the lines, then remove the manual kernings. If you do the remove manual kerns first, you end up with a single column of text with tspan. The benefit is that you can edit the text more easily, but it doesn't look remotely like the pdf document.
A function that would collect grouped text and place it into a single text with tspans in a similar way would be extremely useful, even if it doesn't allow for editing that well. Text search and retrieval would become a lot easier with it.
I think however that a smarter algorithm that would collect characters in the same string and place them on a baseline defined by the tspan element and it's relative position to the text tag would be a giant step towards more useable PDF text import. And it seems the same goes for PS text.
BTW if the pdf import comes across unknown characters, it will use that as escape codes and create a new tspan rather than using a undefined glyph to fill the hole (have that when "()" are in the original text generated from a Chinese version windows for instance. Probably some gb2312 to Unicode conversion bug.)
Anyway,.. keep up the good work, typesetting hasn't been Inkscape's strong point traditionally, so any work on that is greatly appreciated.
Cheers
Jelle
On Mon, 03 Dec 2012 10:44:31 +0800, inkscape-devel-request@lists.sourceforge.net wrote:
Split text goes the other way - it breaks strings into smaller pieces. Maybe you meant text -> Remove Manual Kerns? AFAIK there is no function that merges two <text>'s, other than by doing it manually: cutting one and pasting it into the end of the other. Which will generally move the second <text>. The "split text" extension also moves the component pieces, and does the entire <text> not just a selected substring, but I guess I could modify it to be better behaved. There is another problem with some PS files - they drop all the spaces. So that "this is text" becomes the character set {t,h,i,s,i,s,t,e,x,t}. The code I'm working on will have an option to try to reinsert the spaces based on the letter spacing. Regards, David Mathog mathog@...1176... Manager, Sequence Analysis Facility, Biology Division, Caltech
participants (1)
-
Jelle