Input language awareness problem

A reference to this page has been sent to the wm-spec mailing list in the hope of resolution.

The problem
Modern text rendering is extremely language dependent. Since the unicode.org consortium, in its infinite wisdom, lumped regional glyph variants at the same code points (Chinese Han vs Japanese Han aka Han unification, Arabic Arabic vs Farsi Arabic, Russian Cyrillic vs Balcanic Cyrillic…), text layouting libs like Pango absolutely need to know what language is being rendered to make the right glyph choices.

It's not sufficient to pass them Unicode strings — the same Unicode point is used for different regional variants of the same glyph, so language context is needed to select the right one.

This is not a problem for pre-existing text as most established modern text formats (XML, (X)HTML, ODF…) allow tagging text with language info.

It is a huge problem for text the user types itself. Our applications have absolutely no way to know what language the user is currently inputing. Therefore, they're not giving the right information to their text layouting library, and it makes mistakes.

Remove conflicting fonts
If the system only includes fonts with the glyph variants needed for the user language, there is no risk of mis-selection, right ?

… except pruning fonts like this is a system installer nightmare, modern fonts do include all variants in the same font file, you can have users with different needs on the same system, and they may read or write text in another regional language with conflicting glyph needs.

Just use the user locale
Well I'm typing this English text right now, and I can assure you my locale is not an English-speaking one.

Query the current keyboard layout or input method (IM)
X allows applications to query the current keyboard layout via xkb. Unfortunately there is no 1:1 relationship between layout/IM and languages. I'm currently typing English with a latin layout not designed for English. Some layouts (like the latest Canadian one) are designed for multilingual use. Some countries use US layouts because it's cheaper than defining a different local standard. As a result inferring language from layout or IM is extremely unreliable.

Add language toggles to every application that allows text input
I hope no one thinks that's a smart solution. It's a screen-wasting usability nightmare requiring deep surgery in every GUI application. It's the one OpenOffice.org chose, though.

Right solution
Windows solved this problem years ago. Instead of having a keyboard layout/IM switcher, you add an input language switcher to the desktop.

Switching input languages may require a layout/IM change, or not. When it requires a layout/IM change the switcher behaves like the current GNOME or KDE keyboard applet. The smart trick is that even when there is no layout/IM change needed the user can still press his switcher hotkey to perform a layout/IM-neutral input language switch, and apps are informed the language being typed changed.

In other words the switcher manages a list of (input language,layout/IM tupples), instead of a raw (layout/IM) list. The user can re-use the same layout/IM in different list elements. Then he cycles through his list by input language. And applications are informed at every point what language he's currently typing.


 * The problem, as pointed by Sergey Udaltsov, is the X protocol made no provision to relay language state, just layout state.
 * But according to Jim Gettys « This is just about trivial to add by using X properties. It just requires agreement on a convention ».

I'm writing this in the hope our desktop people will agree to such a convention, and the fonts the SIG packages can be rendered properly.

Other benefits
Correct text rendering is not the only use requiring live language change info. Every application that does spellchecking needs to know what dictionary to load when the user is typing.

Comments
MatthiasClasen:


 * "extremely language dependent" is a bit of an overstatement, I'd say. We certainly talk mostly about fine typography aspects here. We is not to say that it is not worth getting those right, but it is not as if people can't read what they type unless the language is specified. If that were the case, I'd consider it a font bug (and I'm sure sufficient application of OpenType technology allows to construct such fonts).

''Unfortunately, the numerous Han unification or Arabic vs Farsi flamewars disagree with you. Users in affected locales do care in a major way the right glyph variant is used. And the best font in the world can only provide every regional variant. If there's no language info available, the text rendering lib won't know which one to choose.''


 * I don't necessarily agree that having input language selection inside the application is bad, it can allow for better ui integration and use of application-specific knowledge that is not available to a global mechanism. Of course, it would be good to have a desktop-wide mechanism available as a fallback for all the apps that don't implement an application-specific solution.

''All I'm asking for is a general input language state the app can query to do language-specific processing, and which is displayed in the DE language selector, so users know what the current state is. (with MPX you'd probably need to associate a state per keyboard). It would probably be a good idea to let applications request change in this state if they feel the need to, but adding UI elements in-application that duplicate the DE input language switcher function serves no purpose. I'll let you desktop people decide whether an application input language request should be obeyed at all times or treated like an hint.''


 * What you should clarify is that "language" here only means "input language" - it is not applicable to translations or to rendering preexisting documents.

OK, done

Category:Fonts_SIG