
On Mon, Sep 21, 2009 at 10:57:54AM +0530, Santhosh Thottingal wrote:
This is regarding a wishlist bug reported here; https://bugs.launchpad.net/inkscape/+bug/171140 I am writing an extension for hyphenating the text when it is justified.
It is on top of the python hyphenation code written by Wilbert Berendsen. The hyphenation rules, also called as patterns is TeX or Openoffice itself.
If we want it within Inkscape rather than as an extension, then we could use libhyphen, which is what OpenOffice itself uses.
There are a few more changes need to be done: a) Making the extension language independent: Loading all the patterns from a directory while initializing? Or is it okey to ask user to select the language? As of now I am doing a unicode range checking to differentiate between Malayalam and English. but it will be buggy for other languages.
The right thing to do as far as SVG is concerned is to look at the xml:lang tag (see http://www.w3.org/TR/xml/#sec-lang-tag).
Of course, that then requires that Inkscape have a GUI for selecting bits of text and saying what language they are.
In absence of information from HTTP or MIME headers (not currently available to Inkscape; I suppose we should add a command-line option so that any web browsers or mail clients can pass us any language information they encounter), and in absence of the aforementioned command-line option, I suppose we'd consult the locale.
(As you say, unicode script range checking is also useful. In many cases, I'd guess that it should even take precedence over xml:lang specification, if the xml:lang language doesn't use this script at all. Though I don't know how to find what scripts a hyphenation dictionary provides for.)
Btw, I like the fact that it works by inserting soft hyphens into the text: that may help the result to be more reproducible across SVG renderers.
Does Inkscape's "Convert to text" command do the right thing?
pjrm.