Languages and Language Codes


The question of how to represent languages is a nightmare for every localisation software. Often they are vaguely based on ISO standards, but usually there will be deviations here and there.It's bad enough inside one system, but the Glossary Converter needs to unify them all, so that cross-format conversions are lossless, if in any way possible.


ISO codes

The fundamental standard for languages is usually just referred to as "ISO", but actually quite a few related standards are used:

  • ISO 639, language codes, these include (amongst a few more exotic ones we'll ignore) :
    • 639-1 for two letters: en, and 
    • 639-2 for three letters: eng
  • ISO 3166, country codes, these include (again skipping a few) :
    • 3166-1-alpha-2 for two letters: US, and 
    • 3166-1-alpha-3  for three letters: USA
  • ISO 15924, writing system codes: Cyrl 
  • IETF BCP 47, the syntax to combine the ISO components into one tag: sr-Cyrl-BA


RWS specific tags

BCP 47 also allows to define proprietary codes, called private use subtags, and RWS uses a few of those: es-x-int-SDL

For the standard, only es exists, everything from the x is undefined outside RWS


Trados legacy codes

Last, but not least: MultiTerm uses a number of language codes that relate to nothing else: SH-B4

Trados were ahead of their time and supported languages before Windows did. Due to the curse of backward compatibility, that means that out of date codes are still used.

Created with the Personal Edition of HelpNDoc: Benefits of a Help Authoring Tool