How to write a GNU libc locale

Back

This is a draft document explaining how to write locale files for GNU libc. It will not go into details, but reference specifications. It will on the other hand mention some of the pitfalls, and try to document the current practice.

How to choose the locale file name

Locale names consist of three parts. The language code, the country/region code, and the optional modifier. The format is language_REGION@modifier. The language code is a code from ISO 639. The two-letter code is prefered, but a three letter code is accepted if no two-letter code is available. The country/region code is a code from ISO 3166. If the language or region in question is missing in the ISO standard, one need to get the ISO standard updated before the locale will be included in glibc. If one can't convince the ISO 639 maintainers that your language exists (and thus need a language code), the glibc maintainers will refuse to add the locale. In addition, the glibc maintainers seem to refuse "artificial languages" like Esperanto and Lojban, even if they got a ISO 639 code.

Little is known about the requirements for the naming of modifiers. The following modifiers are currently used: abegede, cyrillic, euro and saaho. This might indicate that lower case letters are prefered in modifier names.

It is recommended to follow RFC 3066 when selecting locale names.

Category order

To make it easier to compare locales with each other, I recommend using the same order for the categories in all locales. Any order will do, so I picked the order used in most locales, and decided to recommend this order:

  1. LC_IDENTIFICATION
  2. LC_CTYPE
  3. LC_COLLATE
  4. LC_MONETARY
  5. LC_NUMERIC
  6. LC_TIME
  7. LC_MESSAGES
  8. LC_PAPER
  9. LC_NAME
  10. LC_ADDRESS
  11. LC_TELEPHONE
  12. LC_MEASUREMENT

Reuse when possible

One should avoid cut-n-paste when possible, and instead use the copy statement to include sections from locales with identical content.

LD_IDENTIFICATION

The category entries are references to the standard used when writing the given section. The standard refs should have quotes around them, and should not use the <U#> notation. They should normally look something like this:

category  "i18n:1997";LC_IDENTIFICATION
	

LC_MESSAGES

Then yesexpr and noexpr entries should have the form ^[yY<extra>] and ^[nN<extra>], without 0 and 1 and without trailing ".*". The reason is to make sure the expressions have the same form as the expressions used in the C/POSIX locale (^[yY] and ^[nN]).

Standard documents and specifications

Testing the new locale file

To test a new locale on a test machine, do the following:

Example, generating a new de_DE@euro locale using the ISO-8859-15 charset and save it as 'de_DE':

	  cp de_DE@euro /usr/share/i18n/locales/de_DE@euro
	  localedef -i de_DE@euro -c -f ISO-8859-15 de_DE
	  LANG=de_DE date
	

I've made a small tool check-locale capable of detecting a few common mistakes with locales


Petter Reinholdtsen
Last modified: Mon Dec 20 20:17:59 CET 2004