internationalization-cover.png
Nicolle Cysneiros

Nicolle Cysneiros

12 Aug 2020 8 min read

Why Internationalization and Localization matters

According to the always trustworthy Wikipedia, there are approximately 360 million native English speakers in the world. We, as developers, are so used to write code and documentation in English that we may not realize that this number only represents 4.67% of the world population. It is very useful to have a common language for the communication between developers, but this doesn’t mean that the user shouldn’t feel a little bit more comfortable when using your product.

This post will start discussing the definition of internationalization and localization and why they matter for your application. Then we will go over some of the internationalization tools that developers have available to work on their Python and Django projects. Finally, we will present how we had to adapt our development flow to incorporate the internationalization step.

Localization vs Internationalization

Localization (l10n1) is the process of adapting an application, product or even a document to be more user-friendly to customers from different countries and cultures.

On the other hand, internationalization (i18n) is the process of enabling localization of the product on the application. Implementing a software in a way that it will know when and how to show different content depending on the costumer’s locale.

As the Django documentation perfectly summarizes: localization is done by translators and internationalization is done by developers2

However, this simplified definition of internationalization and localization may give the wrong impression that this is just about translation. This process entails several other adaptations needed to make users from different cultures feel more comfortable using your product, such as:

  • Date and currency formatting
  • Currency conversion
  • Units of measurement conversion
  • Unicode characters and bidirectional text (see example below)
  • Time zones, calendar and special holidays

Wikipedia homepage in English

Wikipedia homepage in Arabic

With these adaptations, we can provide a better experience for the customer when using the application.

How can we do that in Python?

GNU gettext

There are some tools that can help localize your Python application. Starting with the GNU gettext package that is part of the Translation Project3. This package offers:

  • A runtime library that supports the retrieval of translated messages.
  • A set of conventions about how programs should be written to support message catalogs.
  • A library supporting the parsing and creation of files containing translated messages.

The following code snippet is just a simple Hello World, a app.py file, where we are using the gettext Python module to create a translation object (gettext.translation) for our app domain, specifying a locale directory and the language that we want to translate our strings to. Then, we assign the gettext function to an underscore (a common practice to reduce the overhead of typing gettext for each translatable string) and, finally, we flag the string “Hello World!” to be translated.

import gettext

gettext.bindtextdomain("app", "/locale")
gettext.textdomain("app")
t = gettext.translation("app", localedir="locale", languages=['en_US'])
t.install()
_ = t.gettext

greeting = _("Hello, world!")
print(greeting)

After flagging the translatable strings in the code, we can collect them using the GNU xgettext CLI tool. This tool generates a PO file containing all the strings that we have flagged.

xgettext -d app app.py

The PO file (which stands for Portable Object file) contains a list of entries and here is the basic structure of an entry:

#  translator-comments
#. extracted-comments
#: reference…
#, flag…
#| msgid previous-untranslated-string
msgid untranslated-string
msgstr translated-string

We can add comments for translators, some references and flags for the string. Then we have the entry ID (msgid), which is the untranslated string flagged in the code and the entry string (msgstr) representing the translated version of the string.

When we run xgettext in the command line passing the app.py as input file, this is the PO file that is generated:

"Project-Id-Version: PACKAGE VERSION\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2019-05-03 13:23-0300\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: LANGUAGE <LL@li.org>\n"
"Language: \n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"

#: app.py:7
msgid "Hello, world!"
msgstr ""

At the top of the file, we have some metadata about the file, the project and the translation process. Then, we have the untranslated string "Hello, world!" as the entry ID and an empty string for the entry string. If no translated string is provided for a certain entry, the entry ID will be used in the translation.

Once the PO file is generated, we can start translating our terms to different languages. It is important to notice that the GNU gettext library is going to look for translated PO files in a specific folder path structure (<localedir>/<language_code>/LC_MESSAGES/<domain>.po) and there must be one PO file for each language that you want to support.

|-- app.py
|-- locale
   |-- en_US
   |   |-- LC_MESSAGES
   |       |-- app.po
   |-- pt_BR
       |-- LC_MESSAGES
       |   |-- app.po

Here is an example of the same PO file translated to Portuguese:

"Project-Id-Version: PACKAGE VERSION\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2019-05-03 13:23-0300\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: LANGUAGE <LL@li.org>\n"
"Language: \n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"

#: app.py:7
msgid "Hello, world!"
msgstr "Olá, mundo!"

In order to use the translated strings in the code, we need to compile the PO file into MO file using the msgfmt command.

msgfmt -o app.mo app.po

With the MO file in place, it is possible to change the language of the program to Portuguese in the languages input for the translation function. If we run the following code, the flagged string will get translated to “Olá, mundo!”:

import gettext

gettext.bindtextdomain("app", "/locale")
gettext.textdomain("app")
t = gettext.translation("app", localedir="locale", languages=['pt_BR'])
t.install()
_ = t.gettext

greeting = _("Hello, world!")
print(greeting)

Locale Module

This module has access to POSIX locale database and is specially useful for handling dates, numbers and currencies format. The following example shows how to use the Locale library:

import datetime
import locale

locale.setlocale(locale.LC_ALL, locale='en_US')
local_conv = locale.localeconv()
now = datetime.datetime.now()
some_price = 1234567.89
formatted_price = locale.format('%1.2f', some_price, grouping=True)
currency_symbol = local_conv['currency_symbol']

print(now.strftime('%x'))
print(f'{currency_symbol}{formatted_price}')

In this example, we are importing the module, changing all locale settings to US English and retrieving the locale conventions. Using the locale.format method, we can format the number without worrying about decimal and thousands separator symbols. Using the %x directive to format the date, it will display day, month and year in the correct order for the locale. From the locale conventions, we are able to get the correct currency symbol.

This is the output of that Python code. We can see that the date is following the format of Month/Day/Year, tthe decimal separator is a dot while the thousands separator is a comma and there is a dollar sign to represent US currency.

$ python format_example.py
05/03/2019
$1,234,567.89

Now using the same code, just changing the locale to Portuguese Brazil, we get a different output based on Brazilians conventions for formatting: the date is following Day/Month/Year format, we have comma as the decimal separator, dots as thousands separator and the R$ symbol to represent Brazilian currency Real.

import datetime
import locale

locale.setlocale(locale.LC_ALL, locale='pt_BR')
local_conv = locale.localeconv()
now = datetime.datetime.now()
some_price = 1234567.89
formatted_price = locale.format('%1.2f', some_price, grouping=True)
currency_symbol = local_conv['currency_symbol']

print(now.strftime('%x'))
print(f'{currency_symbol}{formatted_price}')
$ python format_example.py
03/05/2019
R$1.234.567,89

Does it get easier with Django?

Translations and Formatting

Internationalization is enabled by default when you create your Django project. The translation module encapsulates the GNU library and provides the gettext function with translation setup based on the language received in the Accept-Language header, that is passed by the browser in the request object. So, all that Python code that we saw before is encapsulated in the django utils translation module so we can jump ahead and just use the gettext function in our view:

from django.http import HttpResponse
from django.utils.translation import gettext as _

def my_view(request):
    greetings = _('Hello, World!')
    return HttpResponse(greetings)

For translations, we can flag translatable strings in both Python and template code (once we load the internationalization tags). The trans template tag translates a single string, while blocktrans tag is able to mark as translatable a block of strings, including variable content.

<p>{% trans "Hello, World!" %}</p>
<p>{% blocktrans %}This string will have {{ value }} inside.{% endblocktrans %}</p>

Besides the standard gettext function, in Django we can have lazy translations: the flagged string will only be translated when the value is used in a string context, such as in template rendering. This is specially useful for translating help_text and verbose_name attributes in the Django models.

Regarding GNU command line interface, django admin provides equivalent commands for the ones most used in the development process. To collect all strings marked as translatable in the code, you just need to run django admin makemessages command for each locale that you want to support in your system. Once you create your locale folder in your project workspace, this command will already create the correct folder structure for the PO file for each language.

To compile all PO files, you just need to run django admin compilemessages. If you want to compile the PO file for a specific locale, you can pass it as argument django-admin compilemessages --locale=pt_BR. To have a further understanding of how translations work on Django, you can check out the documentation4.

Django also uses the Accept-Language header to determine the user locale and correctly format dates, times and numbers. In the following example we have a simple form with a DateField and a DecimalField. To indicate that we want to receive these inputs in the format expected by the user locale, we just need to pass the parameter localize as True to the form field instantiation.

from django import forms

class DatePriceForm(forms.Form):
    date = forms.DateField(localize=True)
    price = forms.DecimalField(max_digits=10, decimal_places=2, localize=True)

How does it change the development flow?

After the application is internationalized, the deploy process needs to be adapted to accommodate the translation step. In our actual project, we started to send any new terms to be translated as soon as the change was deployed to our staging environment. The production deploy would only be approved when all terms had been translated and the PO files compiled.

Another important change in our development flow was the addition of integration tests for different locales during the QA step. The QA team would simulate the locales supported by our application and check if all texts were translated as well as currency and unit of measurements were converted as expected.

Our main take from this whole process of internationalizing an application is that this should have been done as a design step, way back at the beginning of the project. Stopping everything to implement internationalization is not the best approach. If your project is not at an earlier stage anymore, I recommend following the Boy Scout Rule5: start flagging strings that need to be translated whenever you are implementing a new feature or fixing a non-urgent bug. In this way you will still deliver new functionalities while internationalizing the application.

Notes


  1. [1] l10n and i18n are numeronyms, which means that there are 10 letters between the letter “l” and the last letter “n” in “localization”, and the same for “internationalization”. 

  2. [2] Internationalization and localization / Definitions - Django docs 

  3. [3] GCC and the Translation Project 

  4. [4] Internationalization and localization - Django docs 

  5. [5] https://deviq.com/boy-scout-rule/