
On Thu, Sep 24, 2009 at 12:02:07PM +0200, Chris Lilley wrote:
On Monday, September 21, 2009, 6:19:38 PM, bulia wrote:
bb> But still, bb> soft hyphen is a legal Unicode character and we must support it bb> anyway. I'm just wondering if it is supposed to be treated differently bb> depending on language; from what I read, it looks like it is always bb> replaced with a visible hyphen when a break occurs. So it may be a bb> workable solution for English and other languages but not for bb> Malayalam :( Please correct me if I'm wrong.
No, you are right. A soft hyphen is inserted into text to help a layout system figure out where to break a word with a hyphen at the end of the line.
If Malayalam (and indeed other languages) don't use hyphens at the end of lines to indicate broken words, then clearly the users of that language will not normally be inserting the soft hyphens. But if by chance they do, well, the hyphen will be displayed if it falls at the end of a line.
If I correctly understand the above, then it conflicts with section 5.4 of Unicode Annex #14 (http://www.unicode.org/unicode/reports/tr14/#SoftHyphen):
“Depending on the language and the word, that may produce different visible results — for example:
* Simply inserting a hyphen glyph * Inserting a hyphen glyph and changing spelling in the divided word parts * Not showing any visible change and simply breaking at that point * Inserting a hyphen glyph at the beginning of the new line”
Bulia, Chris, what is the source of your information? Is there a conflict of standards, or were you going by an informal source?
pjrm.