[Fwd: Translation problems and bad strings]
FYI
This is an e-mail that Tesla wrote about translation issues. Some parts are GNOME specific, but I thought there were some good general things in here that atleast I didn't think about.
--Ted
-----Forwarded Message-----
From: Telsa Gwynne <hobbit@...144...> To: gnome-hackers@...45..., gnome-doc-list@...45..., gnome-i18n@...45... Subject: Translation problems and bad strings Date: 04 Jan 2004 11:10:44 +0000
Really sorry about cross-post: gnome-i18n know most of this but perhaps not the explanations for a couple of the strings below; gnome-doc-list need to know what chaos "Control" caused; and gnome-hackers is my attempt to catch the hackers.
Not at all sure where replies should go. Use your common sense :)
Some time ago I asked gnome-i18n for what they thought the worst strings to translate in Gnome were. And got a pile of answers which I then didn't summarise on-list. Reading the threads about "what problems do translators face", I am reminded that I should.
I know a load of hackers are familiar with this, but for those who aren't, here's quick description of the translation process. There are loads of ways of translating: editing po files by hand, using a web interface, using gtranslator or KBabel; but they all revolve around a list of strings which go like this:
#: /module/path/path/filename msgid "Original string here with occasional <b> or \n marks" msgstr ""
...and the aim of the game is to fill in msgstr. You do not necessarily have any more context than that, and you have to guess what some strings mean. (For example, if you don't have a CD drive, you can't start rhythmbox to see whether a particular message appears when you stick a CD in..)
I begin with this one, which started the whole thing. A remark from one of the Arabic translators on IRC:
<olimar> joke of the month in a party: "Model column to search through when searching through code"
I'm still not sure what this means..
There are lots of things programmers can do to help here: particularly they can put comments by the strings in the code saying things like "Translators: this is seen by.." or "This refers to..". For some apps there is such specialised vocabulary that this can really help. Unfortunately, nearly everyone decides to translate gtk+ early on (it makes up half of the strings in developer-libs), and it's full of such strings as above and there is not really a lot you can do about it without a gigantic split into "messages for users" and "messages for developers". Having said that, anyone who finishes gtk is well set up to finish most of Gnome :)
So here are the sort of strings translators were dealing with in the summer. The aisleriot examples have gone, I think (yay Callum!): but the rest were all around in the summer.
Historical (ie, gone, to the best of my knowledge):
gnome-games/aisleriot: msgid "borp" Read it backwards: "prob"... (Abel Cheung -- who later figured it out, decided it was cute, and put "melborp" in somewhere else :))
gnome-games/aisleriot,
nautilus/somewhere I forget: msgid " of " This is things like "king of hearts" or "file 1 of 8". This flatly won't translate in some languages. Malayam (.ml) needs to see "n of m" for numbers and change it to say "of m, n". Different words are used for "of" in Welsh (cy) for "1 of 10" and "the king of hearts". There's also a further complication for cy which I don't think I can explain in a single line, so you are spared.
What do these mean in _English_?
#: aisleriot/golf.scm.h:3 msgid "bdc\n" msgstr "" Debugging message referring to "button-double-clicked" subroutine. (Abel Cheung)
#: several places, apparently msgid "Control" msgstr "" "That's just gorgeous - Is is a verb? A noun? What kind of noun? Where can I find it in the app?" (Stanislav Visnovsky) "Control" crops up in strings all over the place. Months later I discovered that this is the Gnome Docs Project official word for "widget", because "widget" is thought not a good word to give to end-users. This is probably true. But translators do at least know "widget"; and it doesn't have another eight possible meanings. And at least some translators didn't know "control" was in the docs team's word list of Good Words.
#: gtk+ msgid "IM Preedit style" msgstr "" from Ole (dk), who noted that you can figure out that IM is input method from other entries (if you're working from the po file and not from a web interface), but preedit?
Some months later, Dave Malcolm explained on IRC, and I think I shall share for anyone who didn't know: <DaveMalcolm> olimar: GTK has small "plugins" that handle text input in different ways; they take keyboard input and convert the keypresses into text being typed. If you right-click in an entry box or gedit you can select the method. <DaveMalcolm> "pre-edit" is where a preview of your edit appears in the control; so for Japanese you might type the romanised form, and have that appear in grey as the preedit string, which might later get converted into hiragana/katakana/kanji characters depending on further input.
So now we know! Thanks, Dave.
Error messages:
gcalctool is a great example for this. There are over _forty_ strings which are error messages referring to the inner workings of the code. For example:
msgid "" "*** B = %d ILLEGAL IN CALL TO MPCHK.\n" "PERHAPS NOT SET BEFORE CALL TO AN MP ROUTINE ***\n" msgid "" "*** ERROR OCCURRED IN MPROOT, NEWTON ITERATION NOT CONVERGING PROPERLY ***\n" msgid "*** ABS(X) NOT LESS THAN 1 IN CALL TO MPEXP1 ***\n"
Words that cause problems for different languages:
- Multiple languages distinguish between "key" as in "thing that turns
in lock" and "thing on a keyboard". It's easy to guess when you're translating gconf itself or acme itself which it should be. In other files though, it's not so easy.
- "Package" and "packet" seem to be the same word in more than one
language (Welsh, French)
"Render":
Render is very hard to translate, at least for Danish. Sometimes it means "draw", sometimes "generate" or "create", sometimes "copy to screen". Usually it is some sort of combination. -- Ole Laursen Abel said it was the same for Chinese.
"Antialiasing":
Perhaps it's just in Welsh, but we in the cy team had endless trouble with this. Translating the parts of the word made no sense. Trying to make up a word which explained what the technique meant made no sense.
"Meta"
Anything involving the Greek prefix "Meta" makes olimar unhappy: trying to find an Arabic equivalent is apparently hard. Metafile, metadata.. okay, so it's a file about files, and data about data. Great: so metacity is... erm. No. Ow. There's also the meta key, but I don't actually remember seeing that in strings.
Long strings of "Noun noun noun noun":
Particularly horrible when one or more of the nouns can also be used as a verb, and common in tooltips and menus. Examples:
# gnome-terminal: NB: totally undocumented feature which jrb explained to me recently. msgid "S/Key Challenge Response"
# libbonobo: msgid "generic factory 'new' moniker" -- Ole again. Abel suggested "all CORBA keywords" as well :)
# libbonobo: msgid "ORB IOR handling moniker" from Andraz
# nautilus: msgid: "Image Properties content view component" from Reinout van Schouwen (.nl)
#: Evolution-groupwise: msgid "Evolution Calendar Groupwise backend"
I can't remember where it's from, but the record is five nouns (some of which might be verbs or instructions) in a row. Other "noun? verb? what?" words can even be "End" and "Finish". "-ing" words have similar problems. I think the technical term for the variety that isn't a verb is "gerund", but I can't think of a good example in the po files offhand (but they are there!)
Miscellany:
Strings that arrived without comment or which don't fit elsewhere.
- "Resident memory set"
- "Minimum Shared Memory Size"
- "Minimum Resident Memory Size"
- "Request obsoletes service's data"
- msgid "Error checking error; no exception"
- "MInternal Error: Weird value (%ld) in do_test\n"
- "Model column to search through when searching through code"
- "FALSE displays the "invisible char" instead of the actual text (password mode)"
- "Just because a crosswalk looks like a hopscotch board doesn't mean it is one"
Incidentally, I showed Alan the "FALSE displays.." one and he said "Makes perfect sense to me." Because he knows what it's talking about. Non-hacker translators don't.
Some translators make a point of filing every string with a problem in bugzilla. This takes _ages_ but it helps. But the problem then is that there is a very limited period when you can change them.
Most translation teams use the translation status tables which are at http://developer.gnome.org/projects/gtp/status/ to keep on top of things. The current 2.5 stuff for each language is at http://developer.gnome.org/projects/gtp/status/gnome-2.6/XX/developer-libs/i... http://developer.gnome.org/projects/gtp/status/gnome-2.6/XX/desktop/index.ht...
(put any language code in XX: sr, cy, de..)
When strings are changed at all, it upsets all the statistics. So you have to find a time when you _can_ change them, because those statistics do matter and do help you keep on top of things. It is really really disheartening to see your 100% app has suddenly gone to 81% because someone has altered all the tabs inside the strings; and even worse when it's a much more substantial change which requires you to do a lot more than just remove the fuzzy marker. And towards the end of a release cycle is not the time to do it.
But some of these strings really do have to be changed, or explanations appended in the comments next to the function that contains the things. For example, Epiphany goes to the appropriate language page on Google because of this comment:
#. Translators you should change these links to respect your locale. #. * For instance in .nl these should be #. * "http://www.google.nl" and "http://www.google.nl/search?q=%s"
Others found with a quick grep:
evolution/po/cy.po:#. This is a filename. Translators take note. gnome-applets/po/cy.po:#. Translators - The + and - refer to increasing and decreasing the volume.
I don't have a complete checkout of all CVS, but I have quite a few modules out. But that's about all there is. A few more of "Translators: this 'plane' is not the sort that flies but instead a term used by Unicode' would be really nice. (Actually, that's a bad example, because apparently the place that appears is not a place you can put such a comment: but it's a good example of the sort of word that might need clarifying.)
There used to be a string review period in the release cycle. It concentrated on the English as far as I know. Making the English clearer certainly helps translators, but even then there can be problems. Most translators do the gnome-glossary early on as a sort of standardising terminology exercise, but even so we (cy) didn't realise that "control" was the approved term for "widget" when we met it later on in po files.
So there you are. I don't really know what the solution is, but I do know that in between 2.4 (we which had at 100% in Welsh) and now, we have acquired 1000 fuzzy strings and 750 untranslated in apps which we had done completely; and another 6000 strings to do from the list of "proposed" so far. That's on top of 16,000 strings which remained constant. That's a lot of strings, and I dread the changing of them in order to make them more intelligible to other teams.
But at some stage, some of these have to be fixed in the originals, which means they become "untranslated" or "fuzzy" in the files of every team which has done them already. It will make it easier for new teams. But I'm not looking forward to the process!
Telsa
gnome-hackers mailing list gnome-hackers@...45... http://mail.gnome.org/mailman/listinfo/gnome-hackers
participants (1)
-
Ted Gould