Creating a Multi-language Application
As an English speaking developer, I sometimes get presumptuous. Its easy to forget there’s more people who don’t speak English than do. Forget Mandarin, I reckon the fastest growing language is Emoji.
When I was designing Ongair (A platform that helps businesses use Instant Messaging to provide faster and personal Customer Service), I made the same mistake. With most of our clients coming from South America and Asia, we were soon getting messages like:
Why doesn’t your system support characters like áéíñóúü?
I need to be able to reply with Emojis?
So I started looking into the problem.
UTF-8 is a character encoding capable of encoding all possible characters, or code points, in Unicode. Unicode is a computing industry standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems.
In short these two allow you to encode (store a representation of) most written characters. So, why isn’t this a default? The short story is about storage space, competition politics (Check out other encoding standards like ASCII)
TLDR: We’re going to make sure our App supports at least UTF-8.
We’re mostly on a Ruby on Rails stack with MySQL, but I’ll include some pointers for those using other languages.
Step 1: Deciding which fields need to be UTF-8
Obviously numeric characters don’t need different representation. So we’re going to be targeting VARCHAR and TEXT fields.
You can either modify the entire table or specific columns.
ALTER TABLE `[table]` CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_bin, MODIFY [column] VARCHAR(250) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin
Step 2: Ensuring your connection supports UTF8
Ensure your database.yml (What is this?) file contains
encoding: utf8mb4 collation: utf8mb4_unicode_ci
Step 2a: What if I’m on PHP?
Make sure your connection string specifies the encoding
DB=mysql://root:root@127.0.0.1:8889/my_database;charset=utf8
Ensure your PHP installation has multibyte string installed. You can check for this by looking at your info page.
Step 3: Use responsibly
There’s a few things we noticed using the above:
- Sometimes people use Emoji’s in their ‘Names’, ‘Status Messages’, ‘Group Subjects’ not just in normal text. These maybe WhatsApp specific but useful to note in this increasingly Emojified world.
Don’t!Never use language specific words to represent state. If you have an object that could have a state e.g. New, Ready e.t.c. its better to use an integer, as opposed to Nuevo, e.t.c
I’ll add to this list if someone suggests something interesting in the comments.
UPDATE (CSS / Presentation):
So we’re now at the point where we need to show multi-language landing pages. One of the things we realized is that different languages may need different font sizes. Turns out CSS has support for locales.
.tagline { :lang(zn) {font-size:90%;} }
Make mandarin characters 10% smaller. Got this tip from a great resource on localization in Rails.