GNU Gettext – Yet Another Tutorial

Well, developing a simple C program is easy, but developing it in a internationalized way (Yeah!! all those l10n, m17n and i18n thingy) is not so easy unless you understand Autotools. However, understanding it may take some time (atleast it took some time for me). In this post, I’m trying to explain how I learnt it. It may be wrong way, but atleast I can recollect what I did today in future.

Lets just create a simple C project. Obviously without any doubt, it should be called ‘helloworld’. Lets just create the directory tree first.

$ mkdir -p helloworld/{src,man}

Switch to ‘helloworld/src’ and create two files ‘helloworld.h’ and ‘helloworld.c’

/* helloworld/src/helloworld.h */
#ifndef __HELLOWORLD__
#define __HELLOWORLD__

#define _GNU_SOURCE

#include <stdio.h>
#include <stdlib.h>
#include <libintl.h>
#include <locale.h>

#define _(STRING) gettext(STRING)

#endif

We have to include libintl.h to get ‘bindtextdomain(3)‘, ‘textdomain(3)‘ and ‘gettext(3)‘ functions. We have to include locale.h to get ‘setlocale(3)‘ function, let see why we need these functions,

setlocale()

Every glibc executable starts with the default locale called ‘C’. We use ‘setlocale()’ function to switch to different locale, this function takes two parameters ‘category’ and ‘locale’, ‘category’ indicates which locale variable we want to change and ‘locale’ contains what is the new value, If ‘locale’ is “”, setlocale() will get the value from the corresponding environment variable (see man page for more details).

bindtextdomain()

The gettext framework works as follows,

  • Get all the english output strings from the sources and generate a .pot file
  • Translate the english strings in the .pot file in different language and create .po file for each language
  • Generate .gmo binary files from the .po file for each language
  • Make the executable read corresponding translation from the .gmo file according to the locale settings each time it wants to print a message

To do the last step, we have to specify where the .gmo files are available. For that purpose, we use ‘bindtextdomain()’, it takes two arguments, ‘domainname’ and ‘dirname’. ‘domainname’ is the name we choose to group all our .gmo files under one place. Most of the time, we use the name of our project as ‘domainname’. ‘dirname’ is the common directory where different project’s .gmo files were placed. Usually it is ‘/usr/share/locale’.

textdomain()

we have to set the ‘textdomain’ so that executable will get the translated messages from the .gmo files correctly. ‘textdomain()’ takes only one argument ‘domainname’ which is the name of our project.

gettext()

Finally we have to wrap every output string to make them pass through ‘gettext()’ so that it can catch the correct translated string from .gmo files. We defined a macro ‘_()’ alias to ‘gettext()’ because we are lazy(aren’t we!?) to type ‘gettext()’ everytime.

So, here is the helloworld.c

/* helloworld/src/helloworld.c */
#include <helloworld.h>

int main(int argc, char *argv[])
{

  setlocale(LC_ALL, "");

#ifdef ENABLE_NLS
  bindtextdomain(PACKAGE, LOCALEDIR);
  textdomain(PACKAGE);
#endif

  printf(_("hello world\n"));

  return(0);
}

Now, we have to replace ‘PACKAGE’ and ‘LOCALEDIR’ macros to the real values. Here comes autotools, automake can give real value to ‘PACKAGE’ at compile time and automake also have a way to define LOCALEDIR at compile time, Lets do autotools by creating following files,

# helloworld/src/Makefile.am
bin_PROGRAMS = helloworld
helloworld_SOURCES = helloworld.c helloworld.h
DEFS += -DLOCALEDIR=\"$(localedir)\"
# helloworld/man/helloworld.1
helloworld :) !!! check after sometime
to see the real man page
# helloworld/man/Makefile.am
dist_man_MANS = helloworld.1
# helloworld/Makefile.am
SUBDIRS = src man

We need to run ‘autoscan’ to generate ‘configure.scan’ file. Rename ‘configure.scan’ to ‘configure.ac’ and edit that file according to the project’s need. I can’t explain all the autoconf macros within this blog post, see the end of this blog post to get the links for further reading.

$ cd helloworld
$ autoscan

Here is the customized ‘configure.ac’ file,

# helloworld/configure.ac
#                                               -*- Autoconf -*-
# Process this file with autoconf to produce a configure script.

AC_INIT([helloworld], [0.1], [mokka at comedysite dot com])
AC_CONFIG_SRCDIR([src/helloworld.c])

# Automake init
AM_INIT_AUTOMAKE([foreign -Wall])

# Checks for programs.
AC_PROG_CC
AM_PROG_CC_C_O

# Gettext init
AM_GNU_GETTEXT_VERSION([0.18])
AM_GNU_GETTEXT([external])

# Checks for libraries.

# Checks for header files.
AC_CHECK_HEADERS([libintl.h locale.h stdlib.h])

# Checks for typedefs, structures, and compiler characteristics.

# Checks for library functions.
AC_CHECK_FUNCS([setlocale])

AC_CONFIG_FILES([Makefile
                 man/Makefile
                 src/Makefile])
AC_OUTPUT

Now we have to run ‘gettextize’ under ‘helloworld’ directory to put gettext settings into ‘configure.ac’ and ‘Makefile.am’.

$ cd helloworld
$ gettextize

If things go well, you can see ‘helloworld/po’ directory and modifications into ‘configure.ac’ and ‘Makefile.am’. Now we can run ‘autoreconf’ to finish autotools procedure.

$ cd helloworld
$ autoreconf --force --install --verbose

Now, switch to ‘helloworld/po’ and rename ‘Makevars.template’ to ‘Makevars’. Inside ‘Makevars’ file, you may have to give inputs to some variables, may be atleast to ‘MSGID_BUGS_ADDRESS’, Here is a way to add your email address to that variable

$ cd helloworld/po
$ mv Makevars.template Makevars
$ sed -i '/^MSGID/s/$/mokka at comedytime dot com/g' Makevars

Now, we need to add the source filenames to ‘POTFILES.in’, Here a way,

$ cd helloworld/po
$ find ../src -name '*.c' -o -name '*.h' | sed 's/\.\.\///g' >> POTFILES.in

Time to compile,

$ cd helloworld
$ ./configure
$ make

You can see PACKAGE, LOCALEDIR macro definitions when make compile helloworld.c. As a programmer, your job is almost done.

Now switch yourself as a translator. go to ‘helloworld/po’ directory and generate a po file for your language using ‘msginit’, you have to provide ‘locale’ using -l option. You should know the languagecode and countrycode to construct ‘locale’ string. ‘msginit’ will ask for your email-id to put yourself into the translators list. Here I’m translating for ‘Tamil’ (ta_IN.utf8).

$ cd helloworld/po
$ msginit -i helloworld.pot -o ta.po -l ta_IN.utf8

I edited ta.po file with gedit+ibus, translated the word “hello world\n” to “வனக்கம்\n’. Now, I have to add my language to LINGUAS file. LINGUAS file contains languagecodes which have corresponding translated .po file inside ‘helloworld/po’ directory.

# helloworld/po/LINGUAS
ta

Now its time to generate binary .gmo file. Before that, We have to re-run ‘autoreconf’ to regenerate the helloworld/po/Makefile.in, because we updated LINGUAS file.

$ cd helloworld
$ make distclean
$ autoreconf --force --install --verbose
$ cd po
$ make update-gmo
rm -f ta.gmo && /usr/bin/gmsgfmt -c --statistics --verbose -o ta.gmo ta.po
ta.po: 1 translated message.
$

If your translation don’t have any errors, you will see ‘1 translated message’. Few more steps to achieve our goal, that is, creating distribution tarball and install our program to see the result.

$ cd helloworld
$ make distclean
$ make dist-bzip2
$ mkdir -p /tmp/buildir
$ mv helloworld-0.1.tar.bz2 /tmp/builddir
$ cd /tmp/builddir
$ tar xvjf helloworld-0.1.tar.bz2
$ cd helloworld-0.1
$ ./configure --prefix="/tmp/destdir"
$ make install
$ LANG="ta_IN.utf8" /tmp/destdir/bin/helloworld
வனக்கம்
$ /tmp/destdir/bin/helloworld
hello world
$

Thats it. My ‘helloworld’ program can say “வனக்கம்” now. You can also make it to speak your favourite language!!

References

There is another beautiful tutorial for gettext available at oriya.sarovar.org.