From Fedora Project Wiki

No edit summary
No edit summary
 
(57 intermediate revisions by 2 users not shown)
Line 5: Line 5:
Localization or (l10n) here refers to the process of adapting, translating or customising that application/package for a particular locale.  
Localization or (l10n) here refers to the process of adapting, translating or customising that application/package for a particular locale.  


<code>Locale</code> is a term used to define a set of information corresponding to a given language & country. A <code>locale</code> information is used by a software application (or operating system) to exhibit a localised behaviour. This localised behaviour is in the form of displaying Application's/package's text in local language or other things pertaining to a locale convention such as localized date, currency format, color conventions, etc.
<code>[http://linux.die.net/man/1/locale Locale]</code> is a term used to define a set of information corresponding to a given [http://www.loc.gov/standards/iso639-2/php/code_list.php language] & [http://www.iso.org/iso/english_country_names_and_code_elements country]. A <code>locale</code> information is used by a software application (or operating system) to exhibit a localised behaviour. This localised behaviour is in the form of displaying Application's/package's text in local language or other things pertaining to a locale convention such as localized date, currency format, colour conventions, etc.


In this tutorial we will cover i18n & l10n only with respect to text i18n/l10n.
In this tutorial we will cover i18n & l10n only with respect to text i18n/l10n.
Line 14: Line 14:


We assume that you use emacs. Do the following:
We assume that you use emacs. Do the following:
1. Run emacs
* Run emacs
2. type ALT+x
* type ALT+x
3. type ansi-term in the lower window
* type <code>ansi-term</code> in the lower window
3. press return key twice one after the other
* press return key twice one after the other




Line 25: Line 25:


<pre>
<pre>
yum install @development-tools
yum install @development-tools
yum groupinstall  <langname>-support
yum install @<langname>-support
</pre>
</pre>


The <langname> above refers to the name of your language. For <code>hindi</code> I would write something like:
The <langname> above refers to the name of your language. For <code>hindi</code> I would write something like:
<pre>
<pre>
yum groupinstall  hindi-support
yum install @hindi-support
</pre>
</pre>


== Hello World==
== Hello World==
Let us write our first Hello World program:
<pre>
<pre>
#include<stdio.h>
#include<stdio.h>
Line 40: Line 41:
int main()
int main()
{
{
     printf("Hello World");
     printf("Hello World\n");
     return 0;
     return 0;
}
}
Line 46: Line 47:




== Internationalized Hello World ==
 
== Internationalizing Hello World ==
The output generated by last program is entirely in English. Now in order to make it localizable in different languages, we need to generalize/internationalize it in some way, such that, when a user selects a particular locale, the application switches its strings/output to the language described by that locale. For example if I select a locale hi_IN.UTF-8 (hi->Hindi; IN->India; encoding->UTF-8), the output/strings of this application should be displayed in Hindi.
 
This is done using the concept of a Message Catalog, which is a database of strings in some file. Gettext supports Message Catalogs in the form of MO files (.mo). These MO files are binary files which store strings for different applications for different locales. When an application runs, all the strings are extracted from application's MO file(s) based on a certain locale. This extraction at run-time is done using gettext functions described in gettext framework.
 
 
Now if we internationalize our Hello World program using gettext, it will look something like this:
<pre>
<pre>
#include<libintl.h>
#include<libintl.h>
Line 53: Line 61:


#define _(String) gettext (String)
#define _(String) gettext (String)
#define _t(String1,String2,n) ngettext (String1,String2,n)


int main()
int main()
Line 60: Line 67:
     bindtextdomain("helloworld","/usr/share/locale");
     bindtextdomain("helloworld","/usr/share/locale");
     textdomain("helloworld");
     textdomain("helloworld");
     printf(_("Hello World"));
     printf(_("Hello World\n"));
     return 0;
     return 0;
}
}
</pre>
</pre>
In this program it is the responsibility of gettext("c format string") function to extract( for "Hello World\n" in English) an equivalent translated string("नमस्कार दुनिया\n" in Hindi), at runtime, in local script of the language describe by a locale (e.g. hi_IN.UTF-8 for Hindi belonging to country India). The general practice is to use macros like _("String") instead of gettext("String") in order to save number of letters we type in.


== Localizing Hello World ==
Create a new directory named po/
Create a new directory named po/
<pre>
mkdir po/
</pre>
</pre>
mkdir po/hi/
</pre>


Extract the strings in a POT (helloworld.pot) file using the following command
Extract the strings in a POT (helloworld.pot) file using the following command
<pre>
<pre>
xgettext -d helloworld -o po/helloworld.pot --keyword=_t:1,2 -k_ -s helloworld.c
xgettext -d helloworld -o po/helloworld.pot -k_ -s helloworld.c
</pre>
</pre>


A new file <code>helloworld.pot</code> will be created inside directory po/
A new file <code>helloworld.pot</code> will be created inside directory po/
== PO(T) files ==
 
helloworld.pot
helloworld.pot:
<pre>
<pre>
# SOME DESCRIPTIVE TITLE.
# SOME DESCRIPTIVE TITLE.
Line 100: Line 108:
#: helloworld.c:13
#: helloworld.c:13
#, c-format
#, c-format
msgid "Hello World\n"
msgstr ""
</pre>
POT file (.pot) stands for Portable Object Template file & it contains a series of lines in pair starting with the keywords <code>msgid</code> and <code>msgstr</code> respectively. In the above example there is only one such pair & <code>msgid</code> is shown first followed by a string in the source language, followed by a msgstr in the next line which is immediately followed by a blank string.
Now in order to translate the application, these POT files are copied as PO (.po) files in respective language folders and then translated. What I mean by translation here is that, corresponding to every string adjacent to <code>msgid</code> there is a translated string (in local script), adjacent to <code>msgstr</code>. For Hindi it will look something like this:
<pre>
msgid "Hello World"
msgid "Hello World"
msgstr ""
msgstr "नमस्कार दुनिया\n"
</pre>
</pre>


create a directory with the name of your language. This language name should be probably a 2-digit/3-digit code listed for your language in ISO 639-1. Use http://www.loc.gov/standards/iso639-2/php/code_list.php for reference. A directory with the same name should also be listed at <code>/usr/share/locale</code>. For hindi I would do this:
 
Now create a directory with the name of your language. This language name should be probably a 2-digit/3-digit code listed for your language in ISO 639-1. Use http://www.loc.gov/standards/iso639-2/php/code_list.php for reference. A directory with the same name should also be listed at <code>/usr/share/locale</code>. For hindi I would do this:
<pre>
<pre>
mkdir hi/
mkdir hi/
Line 132: Line 151:
#: helloworld.c:13
#: helloworld.c:13
#, c-format
#, c-format
msgid "Hello World"
msgid "Hello World\n"
msgstr "नमस्कार दुनिया"
msgstr "नमस्कार दुनिया\n"
</pre>
 
Now in order to translate, you can use an Input Method which allows you to use standard keyboard to type in your native script. Most common used input method engines are iBus, SCIM, UIM etc. You can download any of them using <code>yum</code>. For iBus I would do something like
<pre>
yum install ibus ibus-table*
</pre>
And then select the script/language I want to type for translation.
 
== compiling and running a Localized Hello World ==
 
create an MO (.mo) file using the following command:
<pre>
msgfmt helloworld.po -o helloworld.mo
</pre>
 
In <code>root</code> mode copy the MO file to /usr/share/locale/hi/LC_MESSAGES. For Hindi I would do something like this:
<pre>
cp helloworld.mo /usr/share/locale/hi/LC_MESSAGES/
</pre>
 
Compile your C file
<pre>
cd ../../
gcc -o helloworld helloworld.c
</pre>
 
Run something like
<pre>
LANG=hi_IN
./helloworld
</pre>
</pre>


In <code>root</root> mode copy the po file to your <lang>/LC_MESSAGES directory at /usr/share/locale/hi/LC_MESSAGES. For Hindi I would do something like this:
You should see message (Hello World) appear in your local language:
<pre>
<pre>
cp cp helloworld.mo /usr/share/locale/hi/LC_MESSAGES/
[nkumar@localhost]$ LANG=hi_IN
[nkumar@localhost]$ ./helloworld
नमस्कार दुनिया
</pre>
</pre>
.....editing on-----
== Examples ==
* http://nkumar.fedorapeople.org/helloi18n/helloworld/
* http://nkumar.fedorapeople.org/helloi18n/gtkgettext/
* http://nkumar.fedorapeople.org/helloi18n/helloworldintld/
 
== Resources ==
* Gettext Manual: http://www.gnu.org/software/gettext/manual/gettext.html
* Format of PO files: http://www.gnu.org/software/gettext/manual/gettext.html#PO-Files
* Country codes: http://www.iso.org/iso/english_country_names_and_code_elements
* Language codes: http://www.loc.gov/standards/iso639-2/php/code_list.php
* man locale: http://linux.die.net/man/1/locale
* http://www.madboa.com/geek/utf8/
* http://translate.fedoraproject.org
* http://l10n.gnome.org
* http://i18n.kde.org
* http://i18n.xfce.org
* http://l10n.mozilla.org
* http://l10n.openoffice.org

Latest revision as of 09:51, 30 April 2010

Author: Naveen Kumar

Internationalization (i18n) refers to an application's/package's support for multiple languages. This support comes from a kind of generalization on part of application/package that helps Localize it in different languages.

Localization or (l10n) here refers to the process of adapting, translating or customising that application/package for a particular locale.

Locale is a term used to define a set of information corresponding to a given language & country. A locale information is used by a software application (or operating system) to exhibit a localised behaviour. This localised behaviour is in the form of displaying Application's/package's text in local language or other things pertaining to a locale convention such as localized date, currency format, colour conventions, etc.

In this tutorial we will cover i18n & l10n only with respect to text i18n/l10n.

Gettext framework is one such approach to do text i18n. It refers to a collection of tools which are used to internationalize and localize an application/package. Apart from internationalization of applications/packages these tools assist in translating the strings on menus, messages boxes or icons on the applications in the language that the user is interested in.

For a detailed information on text internationalization you can refer to Gettext manual

We assume that you use emacs. Do the following:

  • Run emacs
  • type ALT+x
  • type ansi-term in the lower window
  • press return key twice one after the other


Development Environment

To internationalize an application we need a set of development tools. This is a one-time-only setup, installed by running those commands from a system administration (root) account:

yum install @development-tools
yum install @<langname>-support

The <langname> above refers to the name of your language. For hindi I would write something like:

yum install @hindi-support

Hello World

Let us write our first Hello World program:

#include<stdio.h>

int main()
{
    printf("Hello World\n");
    return 0;
}


Internationalizing Hello World

The output generated by last program is entirely in English. Now in order to make it localizable in different languages, we need to generalize/internationalize it in some way, such that, when a user selects a particular locale, the application switches its strings/output to the language described by that locale. For example if I select a locale hi_IN.UTF-8 (hi->Hindi; IN->India; encoding->UTF-8), the output/strings of this application should be displayed in Hindi.

This is done using the concept of a Message Catalog, which is a database of strings in some file. Gettext supports Message Catalogs in the form of MO files (.mo). These MO files are binary files which store strings for different applications for different locales. When an application runs, all the strings are extracted from application's MO file(s) based on a certain locale. This extraction at run-time is done using gettext functions described in gettext framework.


Now if we internationalize our Hello World program using gettext, it will look something like this:

#include<libintl.h>
#include<locale.h>
#include<stdio.h>

#define _(String) gettext (String)

int main()
{
    setlocale(LC_ALL,"");
    bindtextdomain("helloworld","/usr/share/locale");
    textdomain("helloworld");
    printf(_("Hello World\n"));
    return 0;
}

In this program it is the responsibility of gettext("c format string") function to extract( for "Hello World\n" in English) an equivalent translated string("नमस्कार दुनिया\n" in Hindi), at runtime, in local script of the language describe by a locale (e.g. hi_IN.UTF-8 for Hindi belonging to country India). The general practice is to use macros like _("String") instead of gettext("String") in order to save number of letters we type in.

Localizing Hello World

Create a new directory named po/

mkdir po/

Extract the strings in a POT (helloworld.pot) file using the following command

xgettext -d helloworld -o po/helloworld.pot -k_ -s helloworld.c

A new file helloworld.pot will be created inside directory po/

helloworld.pot:

# SOME DESCRIPTIVE TITLE.
# Copyright (C) YEAR THE PACKAGE'S COPYRIGHT HOLDER
# This file is distributed under the same license as the PACKAGE package.
# FIRST AUTHOR <EMAIL@ADDRESS>, YEAR.
#
#, fuzzy
msgid ""
msgstr ""
"Project-Id-Version: PACKAGE VERSION\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2010-04-27 17:42+0530\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: LANGUAGE <LL@li.org>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=CHARSET\n"
"Content-Transfer-Encoding: 8bit\n"

#: helloworld.c:13
#, c-format
msgid "Hello World\n"
msgstr ""


POT file (.pot) stands for Portable Object Template file & it contains a series of lines in pair starting with the keywords msgid and msgstr respectively. In the above example there is only one such pair & msgid is shown first followed by a string in the source language, followed by a msgstr in the next line which is immediately followed by a blank string.

Now in order to translate the application, these POT files are copied as PO (.po) files in respective language folders and then translated. What I mean by translation here is that, corresponding to every string adjacent to msgid there is a translated string (in local script), adjacent to msgstr. For Hindi it will look something like this:

msgid "Hello World"
msgstr "नमस्कार दुनिया\n"


Now create a directory with the name of your language. This language name should be probably a 2-digit/3-digit code listed for your language in ISO 639-1. Use http://www.loc.gov/standards/iso639-2/php/code_list.php for reference. A directory with the same name should also be listed at /usr/share/locale. For hindi I would do this:

mkdir hi/
cp helloworld.pot hi/helloworld.po

Open an Editor of your choice and translate your file in the following manner:

# Hello World Localization.
# Copyright (C) 2010 Naveen Kumar
# This file is distributed under the same license as the PACKAGE package.
# Naveen Kumar <nkumar@redhat.com>, 2010.
#
msgid ""
msgstr ""
"Project-Id-Version: helloworld 1.0\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2010-04-27 18:31+0530\n"
"PO-Revision-Date: 2010-04-27 18:53+0530\n"
"Last-Translator: Naveen Kumar <nkumar@redhat.com>\n"
"Language-Team: Hindi <LL@li.org>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"

#: helloworld.c:13
#, c-format
msgid "Hello World\n"
msgstr "नमस्कार दुनिया\n"

Now in order to translate, you can use an Input Method which allows you to use standard keyboard to type in your native script. Most common used input method engines are iBus, SCIM, UIM etc. You can download any of them using yum. For iBus I would do something like

yum install ibus ibus-table*

And then select the script/language I want to type for translation.

compiling and running a Localized Hello World

create an MO (.mo) file using the following command:

msgfmt helloworld.po -o helloworld.mo

In root mode copy the MO file to /usr/share/locale/hi/LC_MESSAGES. For Hindi I would do something like this:

cp helloworld.mo /usr/share/locale/hi/LC_MESSAGES/

Compile your C file

cd ../../
gcc -o helloworld helloworld.c

Run something like

LANG=hi_IN
./helloworld

You should see message (Hello World) appear in your local language:

[nkumar@localhost]$ LANG=hi_IN
[nkumar@localhost]$ ./helloworld 
नमस्कार दुनिया

Examples

Resources