Engineering

Validate your iOS and Android translations with Locheck

Validate your mobile translations with Locheck

When mobile apps need to ship in multiple languages, the developer often hires a contractor or external service to translate all the strings. The people doing these translations are usually unfamiliar with the technical details of localization, which makes it easy for them to introduce bugs when a string contains a variable. Even if the app only ships in one language, it’s still possible to write bugs by making subtle mistakes. In order to ship a bug-free app, the developer needs some way of ensuring the format strings in every translation are correct even if they don’t speak the language.

At Asana, where we ship in 13 languages, we developed Locheck to automatically verify that every string in our .strings, .stringsdict, and strings.xml files use consistent arguments and types, and report errors to our CI pipelines. In this post, I’ll cover some challenges with localization and show how Locheck makes sure we don’t ship with bugs.

How Locheck catches bugs

Locheck compares the language you develop into the languages you translate to, and makes sure all their types match. It can catch things like when:

  • A string appears in one localization but not another
  • An argument is used in a localization but does not appear in the base localization
  • An argument has different types in different localizations or different plural variants
  • The translation has misspelled a named variable

For .strings and .strings.xml files, this is relatively simple given a fancy enough regular expression and knowledge of the syntax. Locheck parses a string like "%s added %d tasks to %3$s" into a list of Swift structs:

[
  FormatArgument(specifier: "s", position: 1),
  FormatArgument(specifier: "d", position: 2),
  FormatArgument(specifier: "s", position: 3)
]

(We are very fortunate that iOS and Android use a close enough format string syntax.)

Locheck then generates a list for each string, and then compares the same string keys across translations, logging a warning or error if they differ. Some issues might cause crashes, for example if your German translation uses %s instead of %d.

The challenges of .stringsdict

.stringsdict files are much more complicated. Here’s a shorthand version of the plural rule I showed earlier:

"%s added %d task(s) to 's': formatKey: "%s added %#@tasks@ to %3$s"
 tasks:
   specType: plural
   value: d
   one: a task
   other: %d tasks

The %#@tasks@ substring means “recurse into tasks.” You can even nest these rules:

"%s added %d task(s) and %d milestone(s) to 's':
 formatKey: "%s added %#@tasks@ to %s"
 tasks:
   specType: plural
   value: d
   one: "a task%#@milestone@"
   other: "%d tasks%#@milestone@"
 milestone:
   specType: plural
   value: d
   one: " and a milestone"
   other: " and %d milestones"

(There is a simpler way to define this rule, but sometimes nesting is really necessary.)

These rules form a grammar, defining a set of possible strings. The rules are traversed before the format string is applied. That means in order to really be sure the arguments are correct, every permutation needs to be checked. Here are all the permutations of the .stringsdict entry example above:

%s added a task and a milestone to %s
%s added a task and %d milestones to %s
%s added %d tasks and a milestone to %s
%s added %d tasks and %d milestones to %s

Given the permutations above, look at how the arguments differ in each permutation. Without explicit positions, the second permutation might mistakenly use the value for tasks in front of milestones, and try to use a number for the string argument at the end. If we add explicit positions, these problems disappear:

%1$s added a task and a milestone to %4$s
%1$s added a task and %3$d milestones to %4$s
%1$s added %2$d tasks and a milestone to %4$s
%1$s added %2$d tasks and %3$d milestones to %4$s

Locheck knows how to expand these rules and can log intelligent errors to help you find problems.

Examples/Demo_Base.stringsdict
 %s added %d task(s) to 's':
   ERROR: Two permutations of '%s added %d task(s) to 's'' contain different format specifiers at position 3. '%s added %d tasks and %d milestones to %3$s' uses 'd', and '%s added %d tasks and %d milestones to %3$s' uses 's'.

Deep-dive into a common problem

Imagine we’re making a task list app with an activity feed.

On Android, there is built-in support for a plurals element in strings.xml for this:

<plurals name="created_tasks_in_project">
 <item quantity="one">%s added a task to '%s'</item>
 <item quantity="other">%s added %d tasks to '%s'</item>
</plurals>

On iOS, we would add an entry to our Localizable.stringsdict file:

<dict>
   <key>%s added %d task(s) to '%s'</key>
   <dict>
   <key>NSStringLocalizedFormatKey</key>
   <string>%s added %#@tasks@ to %s</string>
   <key>tasks</key>
   <dict>
       <key>NSStringFormatSpecTypeKey</key>
       <string>NSStringPluralRuleType</string>
       <key>NSStringFormatValueTypeKey</key>
       <string>d</string>
       <key>one</key>
       <string>a task</string>
       <key>other</key>
       <string>%d tasks</string>
   </dict>
 </dict>
</dict>

Then in our code, we’d access the string:

// Android
resources.getQuantityString(
 R.plurals.created_tasks_in_project,
 authorName,
 numTasks,
 projectName);
 
// iOS
String.localizedStringWithFormat(
 NSLocalizedString("%s added %d task(s) to '%s'", comment: ""),
 authorName,
 numTasks,
 projectName)

And we’d get back whichever variant matched the value of numTasks we put in.

Or would we? No, we would not!

If we pass a value of 1 for numTasks, the app will actually crash, because after the system substitutes our string value, we’re really doing this:

// Android
AppContext.getContext().getResources().getQuantityString(
 %s added a task to '%s',
 authorName,
 numTasks, // Passing a number to a string argument!
 projectName);
 
// iOS
String.localizedStringWithFormat(
 NSLocalizedString("%s added a task to '%s'", comment: ""),
 authorName,
 numTasks, // Passing a number to a string argument!
 projectName)

This kind of mistake is extremely easy to make if you’re not used to thinking about these details, for example if your job is to translate text between different languages rather than write code all day, or if you’re translating to a language like Japanese where the order is often different. Even if you have a developer review every string, it can be very tricky to spot these issues. And as code and teams scale together, tricky-to-spot bugs become guaranteed-to-ship-to-production bugs.

How to fix it

The right thing to do is to add explicit positions to non-consecutive arguments. Instead of writing %s for our third argument, we should write %3$s, which makes it always use the third argument.

<!-- Android -->
<plurals name="created_tasks_in_project">
                   <!-- The change is here ⬇ -->
 <item quantity="one">%s added a task to '%3$s'</item>
 <item quantity="other">%s added %d tasks to '%s'</item>
</plurals>
 
<!-- iOS -->
<dict>
   <key>%s added %d task(s) to '%s'</key>
   <dict>
   <key>NSStringLocalizedFormatKey</key>
          <!-- The change is here ⬇ -->
   <string>%s added %#@tasks@ to %3$s</string>
   <key>tasks</key>
   <dict>
       <key>NSStringFormatSpecTypeKey</key>
       <string>NSStringPluralRuleType</string>
       <key>NSStringFormatValueTypeKey</key>
       <string>d</string>
       <key>one</key>
       <string>a task</string>
       <key>other</key>
       <string>%d tasks</string>
   </dict>
 </dict>
</dict>

Best practice would be to use explicit positions 100% of the time, but it can be prohibitively time-consuming to retroactively add explicit positions if your source of truth is an online service like Transifex, which is true for us at Asana. And you might still get errors if the people doing the translations aren’t perfect at understanding format strings.

Locheck will catch this type of problem automatically, so it’s safe to use implicit positions. There might still be translation errors where two strings are incorrectly swapped and their format specifiers still match, but at least the app won’t crash.

How to use Locheck

You can install Locheck using Mint or Make:

mint install Asana/locheck --link
# or use make
git clone git@github.com:Asana/locheck.git
cd locheck
make install

Then, you can use the relevant command to check your translations:

# Android: find /values[-*]/ directories containing strings.xml files
locheck discovervalues ./app/src/main/res
# iOS: find .lproj files containing Localizable.strings
#   and Localizable.stringsdict files
locheck discoverlproj "MyApp/Supporting Files"

Locheck emits Xcode-style errors to stderr, as well as a human-readable summary to stdout after all files are examined. It works well as an Xcode Run Script build phase, continuous integration step, or precommit script. Here’s some example output from our demo files:

Examples/Demo_Base.strings
 missing:
   WARNING: 'missing' is missing from Demo_Translation
Examples/Demo_Translation.strings
 bad pos %ld %@:
   WARNING: 'bad pos %ld %@' does not include argument(s) at 1
   WARNING: Some arguments appear more than once in this translation
   ERROR: Specifier for argument 2 does not match (should be @, is ld)
 bad position %d:
   WARNING: 'bad position %d' does not include argument(s) at 1
 mismatch %@ types %d:
   ERROR: Specifier for argument 2 does not match (should be d, is @)
   ERROR: Specifier for argument 1 does not match (should be @, is d)
4 warnings, 3 errors
Errors found

Help us out

While we’ve run Locheck on our own code and a few open source apps, it’s still early. If you do decide to try it out, please leave feedback as a GitHub issue. Enjoy your new localization-bug-free life!

Special thanks to Dominik Gruber and Philip Messlehner

Would you recommend this article? Yes / No