Right to left language support, is this intended or a bug?

Here is a test file:
http://saf.bio.caltech.edu/pub/pickup/hebrew_decorations1.svg
It starts with two Hebrew sentences (R->L) and ends with the English word "overline" (L->R). It is all overlined (text-decoration.) So the text is supposed to look like this:
__________________ --3--><--2--<--1--
The first Hebrew sentence is longer and the second one has two words of three characters each. (No browser I tried rendered this correctly, hence the long verbal description.)
Anyway, when an attempt was made to print this to an EMF file this is what was observed:
1. Each Hebrew character was sent as its own string. 2. In "style" both "direction" and "block_progression" computed values were set to 0. 3. The one English word was sent as a string (not character by character). Again, the direction and block_progression were set to 0.
Naively for this sentence I expected to see two calls to PrintEmf::text(), the first with all of the Hebrew and text direction nonzero, and the second all English and direction zero. Only the second part of that was correct. Before I try to "fix" it - is the current character by character behavior for R->L text intentional? (Not having tried to fix it yet, it could well be that this is a side effect of some of my changes.)
On a related note elsewhere, for instance, when rendering text to the screen, Inkscape leaves mixed L->R,R->L tspans in one piece - and then does some very strange things when one tries to navigate with the arrow keys within them to edit the text. Would it perhaps not be better to break up all mixed direction tspans that hit TNG compute into 2 or more tspans, each of which is either all R->L or all L->R characters, and for good measure, has the direction="rtl" and "ltr" tags set to match? Doing it the current way all sorts of other pieces of Inkscape have to know about bidirectional text. Without the tags being set other code cannot in general determine the tspan direction quickly - put 20 leading and trailing spaces around the core text, and the direction is not determined until something other than a space is encountered. Reduce all such tspans to homogeneous pieces in the TNG compute code and the rest of inkscape only needs to know how to handle the extra direction information for each tspan - which is much simpler to deal with than bidirectional text.
Thanks,
David Mathog mathog@...1176... Manager, Sequence Analysis Facility, Biology Division, Caltech

mathog <mathog@...360...> writes:
Here is a test file:
http://saf.bio.caltech.edu/pub/pickup/hebrew_decorations1.svg
It starts with two Hebrew sentences (R->L) and ends with the English word "overline" (L->R). It is all overlined (text-decoration.)
Just two comments, unrelated to LTR/RTL behavior: 1. Inkscape currently doesn't display text decorations. 2. The font you're using, Agency FB, does not contain Hebrew characters.

This is definitely a bug (actually several) and it also demonstrates the problem I referred to earlier with Inkscape not reducing text to single direction blocks. What is happening here is that Compute Output is called with Block Progression set to R->L, however the Compute Output method involves in part a value which is the sum of the glyph widths added to the starting position. This is used to detect manual kerning in the string. Unfortunately that method is implicitly a L->R operation, so the code ends up thinking that every single R->L character is kerned and breaks them out into single character strings. When it hits the English word (L->R) that part of the code starts working properly.
Since Inkscape has not simplified the text into single direction spans in order to work properly it ends up requiring another part of the code to understand unicode bidirectional text. (Which here, it does not.) Even if the compute output section properly changed the + widths to - widths it would still have broken when it hit the English word, and would have reduced that string to single character strings instead. The processing the logic associated with bidirectional unicode should take place in a single location in the program, and that place is well before the generic print driver.
Regards,
David Mathog mathog@...1176... Manager, Sequence Analysis Facility, Biology Division, Caltech
participants (2)
-
mathog
-
michael grosberg