What is the Common Locale Data Repository?

November 14th, 2011

If you’ve ever been involved in software or website localisation, most likely you will have questions that you just can’t answer like:

  • What do Finnish people call their own language? or
  • What’s the correct country-code to use for a website translated into Simplified Chinese?

The CLDR has these answers and more.

CLDR is one of the Unicode Consortium’s projects. In a nutshell:

The Unicode Consortium is a non-profit organization devoted to developing, maintaining, and promoting software internationalization standards and data, particularly the Unicode Standard, which specifies the representation of text in all modern software products and standards.

First off, if you’ve ever tried to represent accented or scripted characters on a website using anything other than a character set called UTF-8, you’ll know that it’s almost impossible to get right.

Unicode makes simple work of making sure that Chinese, Japanese, Hebrew, Arabic, Hindi & Urdu characters get faithfully represented on the web page or in the software application.

It’s not just script-based languages either, many Latin-based languages also have their own unique accents and special characters – and Unicode solves pretty much every one of the problems associated with them getting mangled.

An extension of this basic character set from Unicode is CLDR, a database of EVERYTHING software developers need to know about a locale (a language spoken in a specific location, such as Swiss-German, or Flemish-Belgian). This data includes:

  • Dates/time formats
  • Number/currency formats
  • Measurement Units
  • Sorting, Searching, Matching
  • Names for Languages, Territories, Scripts, Timezones, Currencies
  • Characters used by a language

Currently, CLDR contains information for over 500 locales – and it’s growing. They freely publish their data in a standardised XML format so that software applications and websites can programmatically remain in sync with the latest consensus on correct locale usage.

Importantly, CLDR data is arrived at by consensus and is ‘owned’ by the community. If you think that the standard translation for the Greek language of Ελληνικά is inaccurate, you can propose an alternative.  If enough people agree with you, your change will become the new standard.

In addition to getting stuff like this correct first time, accurately localising your application or website is an essential part of reaching a broader audience and reducing the friction between the original language version and the translated version.

CLDR saves software developers and business owners from having to reinvent the wheel, and from translating the same standard content over and over again.

Oh, and the Finnish language is called Suomi and the correct country code for China is zh.

Thanks to CLDR, getting our country codes and language links correct for the 20+ languages we’re adding to Kyero.com has been greatly simplified.

Current & Future Locales for Kyero.com

Locale Language Country code Direction Script
Catalan Català ca LTR Latin
Czech Čeština cs LTR Latin
Danish Dansk dk LTR Latin
German Deutsch de LTR Latin
English English en LTR Latin
Spanish Español es LTR Latin
French Français fr LTR Latin
Croatian Hrvatski hr LTR Latin
Italian Italiano it LTR Latin
Hungarian Magyar hu LTR Latin
Dutch Nederlands nl LTR Latin
Norwegian Norske no LTR Latin
Polish Polski pl LTR Latin
Portuguese Português pt LTR Latin
Romanian Română ro LTR Latin
Slovak Slovenčina sk LTR Latin
Finnish Suomi fi LTR Latin
Swedish Svenska sv LTR Latin
Greek Ελληνικά el LTR Greek
Bulgarian български bg LTR Cyrillic
Russian Pусский ru LTR Cryllic
Hebrew עברית he RTL Hebrew
Arabic العربية ar RTL Arabic
Urdu اردو ur RTL Arabic
Hindi हिन्दी hi LTR Devanagari
Japanese 日本語 ja LTR Kanji
Chinese 中文 zh LTR Simplified Chinese

(Locales sorted according to the Unicode Collation Algorithm – Latin scripts first A-Z, followed by Greek, Cyrillic, Hebrew, Arabic, Devanagari, Japanese and Chinese.) Script direction is left-to-right (LTR) or right-to-left (RTL).

CLDR Highlights


Leave your comments about this article