In one test case for Hebrew the word "Shalom" was split so that the first two characters were colored black, the second two green. This test cases uses the Hebrew vowels which are Mark nonspacing characters, other than the coloring, it looks like this:
http://upload.wikimedia.org/wikipedia/commons/7/73/Shalom.png
There are 2 such vowels on the first character position (furthest right), and one on the third character position. After this SVG is read in, when it is written out the 4th character (05dc) is eaten on the output. The debug method was enabled and it shows the problem (below), where for the first span all 4 of the UTF codes are present, but there are only 3 glyphs where there should be 4. The second span is as it should be.
The layout code is migraine inducing, but perhaps one of you can suggest at least approximately where within it this glitch might be found? In particular, is there, by any chance, a calculation somewhere that assumes there is at most one Mark nonspacing glyph per spacing glyph?
==== span 0 in para 0 (direction=rtl) in source 0 (type=0, cookie=0xbacd900) in line 0 (baseline=51.748989, shape=0) in chunk 0 (x=347.181641, baselineshift=0.000000) font 'Ezra SIL SR' 40.000000 upright normalweight x_start = 197.656250, x_end = 143.437500 line height: ascent 42.929688, descent 16.152344 leading 0.000000 direction rtl, block-progression ttb ** characters: 0: '�' 0x05e9 x=0.000000 flags=15b4 glyph=0 1: '�' 0x05c1 x=0.000000 flags=000 glyph=0 <--- Mn 2: '�' 0x05b8 x=0.000000 flags=000 glyph=0 <--- Mn 3: '�' 0x05dc x=-27.968750 flags=414 glyph=2 ** glyphs: 0: 262 (175.546875,0.000000) rot=0.000000 cx=0.000000 char=0 1: 344 (169.687500,0.000000) rot=0.000000 cx=27.968750 char=0 <-- missing a glyph here 2: 287 (143.437500,0.000000) rot=0.000000 cx=26.250000 char=3
==== span 1 in para 0 (direction=rtl) in source 1 (type=0, cookie=0xbacd970) in line 0 (baseline=51.748989, shape=0) in chunk 0 (x=347.181641, baselineshift=0.000000) font 'Ezra SIL SR' 40.000000 upright normalweight x_start = 143.437500, x_end = 103.515625 line height: ascent 42.929688, descent 16.152344 leading 0.000000 direction rtl, block-progression ttb ** characters: 4: '�' 0x05d5 x=0.000000 flags=414 glyph=3 5: '�' 0x05ba x=0.000000 flags=000 glyph=3 6: '�' 0x05dd x=-13.515625 flags=414 glyph=5 ** glyphs: 3: 263 (133.828125,0.000000) rot=0.000000 cx=0.000000 char=4 4: 280 (129.921875,0.000000) rot=0.000000 cx=13.515625 char=4 5: 288 (103.515625,0.000000) rot=0.000000 cx=26.406250 char=6
Thanks,
David Mathog mathog@...1176... Manager, Sequence Analysis Facility, Biology Division, Caltech