Translations and r11895 bug in flow text
Hi Guys,
Just to report that r11895 has a rather ugly bug in the flow text. After creating an initial text, I wasn't able to edit it anymore. Unfortunately launchpad is in the 3 second zone of timeouts for me in Chongqing, China and I've never been able to use it, otherwise I'd add the bug list there. Please bear with my good intention.
And David,
If you've ever imported a PDF file with lots of text in layouts, you may have noticed that every single character gets a position. Now that's fine until you try to change the text in any way as the text string will change, but not the positions string. In other words, you end up with garbage. The workaround for that is to use the split text extension. Which cleans up all the manual kernings nicely.
Maybe that can lead you to solve your problem as well, as it seems related.
Cheers
Jelle
Message: 8 Date: Mon, 26 Nov 2012 13:08:16 -0800 From: mathog <mathog@...1176...> Subject: Re: [Inkscape-devel] tspan Text Starts To: inkscape-devel@lists.sourceforge.net Message-ID: <a3d4787463b3de4b4f2a326b82cdf6ab@...2855...> Content-Type: text/plain; charset=UTF-8; format=flowed
On 23-Nov-2012 19:48, Martin Owens wrote:
hey guys,
I'm developing an extension to manage translations (which I do via launchpad and xml2po) but I'm having trouble with tspans.
The problem seems to be that inkscape saves multiple values for the x attribute for some (not all) tspan sections. Specifying the letter placements is death to translations as the number and size of letters is guaranteed to be different.
Is there any api way to strip out these bumbling attributes or better have them not appear in the first place?
Hmm. Well, this may not be what you are after, but...
I have been working on code to reassemble formatted, editable text from component pieces. The idea is that something like this in Inkscape:
(E:bold)(=mc:no special formatting)(2:superscript)
when present in an EMF or PS file, for instance, is represented by 3 separately formatted text strings: {E,=mc,2}
These are currently read back into Inkscape as just those pieces. It looks exactly like the original, but the pieces are not assembled, so the whole is not editable. My code tries to reassemble the pieces from its position, font information, etc. and makes <test><tspan> records to match. This work is not done but the current version does pretty well at figuring out where paragraphs start and end, figures out the justifications and so forth, and generating editable Inkscape SVG. It works with rotated text, but at present cannot figure out when the first sentence of a paragraph belongs with the remainder if the first is indented by starting it an offset (as opposed to by using leading spaces.)
For your purposes, would it be sufficient if after reassembly the formatting information was discarded and just the logical information retained? That would give you sentences and paragraphs (super and subscripts would be problematical.)
Regards,
David Mathog mathog@...1176... Manager, Sequence Analysis Facility, Biology Division, Caltech
On 27-Nov-2012 19:26, Jelle wrote:
And David,
If you've ever imported a PDF file with lots of text in layouts, you may have noticed that every single character gets a position. Now that's fine until you try to change the text in any way as the text string will change, but not the positions string. In other words, you end up with garbage. The workaround for that is to use the split text extension. Which cleans up all the manual kernings nicely.
Split text goes the other way - it breaks strings into smaller pieces. Maybe you meant text -> Remove Manual Kerns? AFAIK there is no function that merges two <text>'s, other than by doing it manually: cutting one and pasting it into the end of the other. Which will generally move the second <text>. The "split text" extension also moves the component pieces, and does the entire <text> not just a selected substring, but I guess I could modify it to be better behaved.
There is another problem with some PS files - they drop all the spaces. So that "this is text" becomes the character set {t,h,i,s,i,s,t,e,x,t}. The code I'm working on will have an option to try to reinsert the spaces based on the letter spacing.
Regards,
David Mathog mathog@...1176... Manager, Sequence Analysis Facility, Biology Division, Caltech
On Wed, 2012-11-28 at 08:48 -0800, mathog wrote:
There is another problem with some PS files - they drop all the spaces. So that "this is text" becomes the character set {t,h,i,s,i,s,t,e,x,t}. The code I'm working on will have an option to try to reinsert the spaces based on the letter spacing.
For your next trick, try and work out where the text is starting and ending in it's flow and construct a text box to contain it.
Martin,
From MAILER-DAEMON Thu Nov 29 14:42:21 2012
X-ACL-Warn: MIME-Version: 1.0 In-Reply-To: <1354128112.12065.0.camel@...2056...> References: mailman.57205.1353964106.2176.inkscape-devel@lists.sourceforge.net <op.wogrumqhxr72zo@...2910...> <000ac02d3e52b6ff3f4b50421105a5f2@...2855...> <1354128112.12065.0.camel@...2056...> Date: Thu, 29 Nov 2012 06:42:06 -0800 Message-ID: <CA+aQ9usrtdcM0kOkmXKRXAuHStek6prtQ0Gb740j0QDkVhZB7w@...401...> From: inkscape-devel.neophyte_rep@...2295... To: inkscape-devel@lists.sourceforge.net X-Spamgourmet: X-Spam-Score: 0.1 (/) X-Spam-Report: Spam Filtering performed by mx.sourceforge.net. See http://spamassassin.org/tag/ for more details. -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at http://www.dnswl.org/, no trust [216.75.62.102 listed in list.dnswl.org] 0.1 DKIM_SIGNED Message has a DKIM or DK signature, not necessarily valid 0.0 T_DKIM_INVALID DKIM-Signature header exists but is not valid X-Headers-End: 1Te5Jg-0003Vz-0Y Subject: Re: [Inkscape-devel] Translations and r11895 bug in flow text X-BeenThere: inkscape-devel@lists.sourceforge.net X-Mailman-Version: 2.1.9 Precedence: list Reply-To: inkscape-devel.neophyte_rep@...2295... List-Id: <inkscape-devel.lists.sourceforge.net> List-Unsubscribe: https://lists.sourceforge.net/lists/listinfo/inkscape-devel, mailto:inkscape-devel-request@lists.sourceforge.net?subject=unsubscribe List-Archive: http://sourceforge.net/mailarchive/forum.php?forum_name=inkscape-devel List-Post: mailto:inkscape-devel@lists.sourceforge.net List-Help: mailto:inkscape-devel-request@lists.sourceforge.net?subject=help List-Subscribe: https://lists.sourceforge.net/lists/listinfo/inkscape-devel, mailto:inkscape-devel-request@lists.sourceforge.net?subject=subscribe X-List-Received-Date: Thu, 29 Nov 2012 14:42:22 -0000 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit
Perhaps a collaboration with the authors of "Layout-aware text extraction from full-text PDF of scientific articles" < http://code.google.com/p/lapdftext/ > would be productive? It is reviewed here < http://www.scfbm.org/content/7/1/7 >.
On Wed, Nov 28, 2012 at 10:41 AM, Martin Owens - doctormo@...400... wrote:
On Wed, 2012-11-28 at 08:48 -0800, mathog wrote:
There is another problem with some PS files - they drop all the spaces. So that "this is text" becomes the character set {t,h,i,s,i,s,t,e,x,t}. The code I'm working on will have an option to try to reinsert the spaces based on the letter spacing.
For your next trick, try and work out where the text is starting and ending in it's flow and construct a text box to contain it.
Martin,
Keep yourself connected to Go Parallel: INSIGHTS What's next for parallel hardware, programming and related areas? Interviews and blogs by thought leaders keep you ahead of the curve. http://goparallel.sourceforge.net _______________________________________________ Inkscape-devel mailing list Inkscape-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/inkscape-devel
participants (3)
-
Jelle
-
Martin Owens
-
mathog