Internationalization in Linux

What do you do when you need to use your Linux programs in a language other than English? How about writing in non-Latin scripts? In many modern Linux distros, multilingual functionality already works out of the box, but in this howto, we explore how to get it working manually.

Using Linux in your native language
 The C library
Input methods
 IBus – Intelligent Input Bus for Linux / Unix OS
 Smart Common Input Method
 Universal Input Method
The UTF-8 text encoding
Contact details

Using Linux in your native language

Though I personally hate seeing improvised Norwegian translations of common English computer terms, some people will find it more desirable to use Linux software in their native language. Although localization settings for software like Gnome or KDE can be altered simply by using a control panel, users of other software may need to change some settings called environment variables. LC_CTYPE, LANG and LANGUAGE are examples of variables that may affect how programs deal with languages. To find out which values to set the variables to, you can run the command locale -a and see what country/language combinations are available on your system. Try Google if you're unsure of which one you need.

One way of making the settings permanent is to put them in a script that runs every time the system starts up. On Gentoo Linux, the correct file is apparently called /etc/env.d/02locale. For a lot of systems, though, a quick and alomst certainly incorrect way is to enter the necessary codes into the startup script which runs near the very end of the bootup procedure. In some distributions, this file is called rc.local. Assuming that our language of choice is the beautiful "Example Language", the code might be as follows:

export LC_CTYPE=example_LANGUAGE
export LANG=example_LANGUAGE
export LANGUAGE=example_LANGUAGE


Et voila. Upon rebooting your machine a lot of applications should start appearing in the language of your choice.

Another neat feature is the ability to run just one program in another language without changing your settings. Simply specify the variables before your program name on a command line:

LANG=example_LANGUAGE LC_CTYPE=example_LANGUAGE LANGUAGE=example_LANGUAGE my_program

The C library

Support for these settings relies on the C library having support for them compiled in. If it isn't there and you're desparate to have localization support outside of Gnome and KDE, you may want to recompile your C library with foreign language support. This will take quite a while, so sit down with your favorite manga (or manhwa) and a good cup of coffee.

(On Gentoo Linux, if support for locales has been compiled in, the USE flag nls should be found in your /etc/make.conf file. If it isn't there, add nls to the USE flag section and emerge glibc again.)

Input methods

Although most text input is carried out simply by using a certain keyboard layout (Like an American one or a Norwegian one), some languages require the user to use a special input method for entering text. These often work by transliterating the text that the user is entering from the Latin alphabet to some other script, like Hiragana or Traditional Chinese.

There are currently three important implementations of input methods on Unix-like systems: The traditional X Input Method interface (XIM), and two types of input modules particular to the GTK2 and Qt libraries.

This howto deals with three software packages that support all three of these interfaces, as well as a wide range of languages: IBus, SCIM and UIM.

IBus – Intelligent Input Bus for Linux / Unix OS

IBus is an input method framework that aims to support the same broad range of input methods and languages as SCIM, mentioned below, but with a different internal mechanism based on dbus and Python.

The first necessary step is installing IBus itself, and the GTK2 immodule and Qt immodule interfaces which allow it to interface with GTK2- and Qt-based applications.

Input methods must also be installed separately. Here are some notable input methods:

ibus-pinyin: Chinese input via a smart pinyin method. ibus-pinyin supports smart pinyin input for both Simplified and Traditional Chinese.
ibus-anthy: Japanese input.
ibus-hangul: Korean input. ibus-hangul supports romanized Korean as well as standard Korean keyboard layouts. A special mode for quick input of hanja is also supported.
ibus-m17n: Various input methods using libm17n, a multilingual text processing library.

If you have GTK2, you might need to run the following command after the installation:

update-gtk-immodules

Or alternatively:

gtk-query-immodules-2.0 > /etc/gtk-2.0/gtk.immodules

On the newest versions of Ubuntu Linux, a lot of the IBus configuration is done automatically, so I haven't investigated thoroughly how to cause IBus to start automatically, work with XIM, or become the default input method in Qt or GTK2. However, placing the following line in your ~/.xinitrc file ought to cause IBus to start whenever you start X:

/usr/bin/ibus-daemon --xim &

Adding the following lines to ~/.xinitrc ought to make IBus the default input method:

export GTK_IM_MODULE=ibus
export QT_IM_MODULE=ibus
export XMODIFIERS="@im=ibus"


In most GTK2 and some Qt programs you'll be able to select IBus by right-clicking on a text input field without using any special setup, if it isn't already the default:

Selecting an input method in GTK. Selecting an input method in Qt.

To activate IBus, make sure the IBus daemon (ibus-daemon) is running, and press control + space. You should now see a menu in the bottom right corner of your screen where you can choose between several input methods. There should also be a menu option to hide the IBus panel, which some users prefer not to have on their screen. Alt + Left shift can be used to switch between input methods.

The IBus web site can be found at http://code.google.com/p/ibus/.

Smart Common Input Method

Smart Common Input Method is a multilingual input method framework that can be used both through the old X Input Method interface and as a module for GTK2 or Qt.

The first step is installing SCIM and any input methods you might need. Make sure to also install the GTK2 immodule and Qt immodule interfaces.

Some notable input methods:

scim-pinyin: Smart pinyin input. scim-pinyin is mainly for Simplified Chinese, but is somewhat useful for Tradtional Chinese.
scim-anthy: Japanese input.
scim-hangul: Korean input. Users wanting to input romanized Korean will want to install scim-m17n.
scim-m17n: Various input methods using libm17n, a multilingual text processing library.

If you have GTK2, you might need to run the following command after the installation:

update-gtk-immodules

Or alternatively:

gtk-query-immodules-2.0 > /etc/gtk-2.0/gtk.immodules

SCIM needs a backend application to make it work with applications that don't use GTK2 or Qt. If you need this functionality, it's a good idea to start it whenever you start X, so put the following command line somewhere before the last command in your ~/.xinitrc:

scim -d

To make SCIM your default input method, go back to your ~/.xinitrc and add the following lines:

export XMODIFIERS=@im=SCIM
export GTK_IM_MODULE="scim"
export QT_IM_MODULE=scim


In most GTK2 and some Qt programs you'll be able to select SCIM by right-clicking on a text input field without using any special setup. IBus is used here for illustration purposes:

Selecting an input method in GTK. Selecting an input method in Qt.

To activate SCIM, press control + space. Given that it's running properly, you should now see a menu in the bottom right corner of your screen where you can choose between several input methods. There should also be a menu option to hide the SCIM panel, which some users prefer not to have on their screen.

SCIM's web site can be found here: http://www.scim-im.org/. You can find out more at Yukiko Bando's mini guide.

Universal Input Method

Universal Input Method is a highly extendable input method framework that focuses on Asian input methods for GTK2 and Qt. It can be used as a module for GTK2 or Qt, through X Input Method (XIM), and as a module for SCIM.

Start by installing uim itself and any UIM-based input methods you need. Make sure to also install the GTK2 immodule and Qt immodule interfaces.

If you have GTK+ 2, you might need to run the following command after the installation:

update-gtk-immodules

Or alternatively:

gtk-query-immodules-2.0 > /etc/gtk-2.0/gtk.immodules

UIM also needs a helper application to work with applications that don't use GTK2 or Qt. If you need this functionality, it's a good idea to start it whenever you start X, so put the following command line somewhere before the last command in your ~/.xinitrc:

uim-xim &

To make UIM your default input method, go back to your ~/.xinitrc and add the following lines:

export XMODIFIERS=@im=uim
export GTK_IM_MODULE="uim"
export QT_IM_MODULE=uim


To switch to another input method (within UIM), use the application uim-im-switcher. In GTK2 and some Qt programs you'll be able to select input methods by right-clicking on a text input field without using any special setup. IBus is used here for illustration purposes:

Selecting an input method in GTK. Selecting an input method in Qt.

UIM is activated by pressing shift + space.

For more information, have a look at UIM's home page and wiki.

The UTF-8 text encoding

Many systems use character encodings, which represent alphabets as binary data, that only support one or a very few languages, meaning that one string of text written on a Korean system, for instance, can't be read on a Japanese system. However, it's possible to make Linux use the UTF-8 (Unicode Transformation Format) encoding for any language. The UTF-8 text encoding supports a lot of languages without any of them interfering with one another, though it's not compatible with the older character encodings – Only the code points used for the Latin alphabet are the same.

The system locale settings containing settings for among other things the system language, can be made to use UTF-8 with relative ease. Unfortunately, many distros lack UTF-8 locales for most languages. Run the command locale -a and see if you can find a locale for your language with a name ending in .utf8. If not, log in as root (or use su) and convert an already existing locale to the UTF-8 encoding:

localedef -f UTF-8 -i example_LANGUAGE example_LANGUAGE.utf8

Once you know you have a locale that allows you to use UTF-8, you need to tell the system to use it. This might involve adding ".utf8" to the name of the locale you're using, for instance.

export LC_CTYPE="example_LANGUAGE.utf8"
export LANG="example_LANGUAGE.utf8"
export LANGUAGE="example_LANGUAGE.utf8"


Note that input methods that need to be started with a special locale settings, such as the Korean input method ami, still need to be started in their old locale settings.

UTF-8 is already used by most GTK2 and Qt applications, and Java also uses its own, Unicode-based encoding. However, to use UTF-8 in console applications you need a UTF-8 enabled terminal emulator or console. Konsole or gnome-terminal are excellent choices for terminal emulators, but if you don't have Qt or GTK2, then mlterm is for you. To enable UTF-8 on the actual console, use unicode_start from kbd or console-tools.

Some programs, like screen or irssi need special options in order to work with UTF-8.

Contact details

If you have further questions about what languages and scripts are supported by IBus, SCIM and uim, or about how to enable other input methods such as kinput2, please e-mail me and I'll see if I can help.

Thanks to Martin Swift, Botond Botyanszki, Matt Doughty, Scott Robbins, Tokunaga Hiroyuki, James Su and the Scandinavian Gentoo forums for contributing to the information contained within this document.