💾 Archived View for d.moonfire.us › blog › 2015 › 02 › 17 › mfgames-culture-api-languages captured on 2021-12-05 at 23:47:19. Gemini links have been rewritten to link to archived content

View Raw

More Information

⬅️ Previous capture (2021-11-30)

➡️ Next capture (2022-01-08)

-=-=-=-=-=-=-

MfGames.Culture API - Language Codes

Up a Level

Previous

Next

This is the first part of a short series on the MfGames Culture CIL API. It is currently alpha software, but I'm looking for critiques, opinions, and general feedback. All of my work for this is in the Github repository[1] in the `drem-0.0.0` branch. It is licensed with MIT.

1: http://github.com/dmoonfire/mfgames-culture-cil/

This page is also a form of documentation by example.

When I started working on the culture logic, I decided to hang the code off as many standards as possible. I was very familiar with ISO 639[2]. ISO 639 is a standardized list of languages and codes to identify them. You can see these in various programs and places such as `en` or `fr` (English and French respectively).

2: http://en.wikipedia.org/wiki/ISO_639

Links

Introduction

Country Codes

ISO 639

There are a few components to the ISO 639 code:

Actually, there are two versions, a bibliographic and a terminologic code. These are known as the `B` and `T` codes respectively. The bibliographic code is based on the English translation of the name while the terminologic is based on the language's name for itself.

For example, the bibliographic code for Armenian is `arm` while the terminologic is `hye`.

According to Wikipedia, the terminologic is the preferred over the bibliographic.

Also, `en` and `eng` are identical codes, but if you treat them simply as a string, they are different.

System.Globalization

To my surprise, there is no dedicated object in the base library for C# for ISO 639 codes. There is some properties in `System.Globalization` on `CultureInfo`, but nothing that handles the equivalency of `en` and `eng`. And I haven't had a lot of success with creating non-standard languages (my `xmi` for Miwāfu) inside the framework.

There are some enum versions of the ISO code, but they don't have the flexibility to add custom languages.

Unable to find something already there, I created my own ISO 639 class for handling these codes. I called it `LanguageCode` because I didn't like how `Iso639` looked. It does ignore the other standards for languages right now, but I was thinking that `LanguageCode` could handle all of those as separate properties.

var english1 = new MfGames.Culture.Codes.LanguageCode("eng");
var english2 = new MfGames.Culture.Codes.LanguageCode("eng", "en");

Assert.AreEqual(english1, english2);

I set it up so the `ToString` translates into the preferred three-character code.

var armenian = new LanguageCode("hye", "hy", "arm");

Assert.AreEqual("hye", armenian.IsoAlpha3);
Assert.AreEqual("hye", armenian.IsoAlpha3T);
Assert.AreEqual("arm", armenian.IsoAlpha3B);
Assert.AreEqual("hy", armenian.IsoAlpha2);
Assert.AreEqual("hye", armenian.ToString());

`LanguageCode` is an immutable object that encapsulates all the properties of an ISO 639 code except for its name. It also compares against the preferred three-character code for equivalency. I also had it intern the strings to avoid memory pressure with larger number of codes.

Memory

Memory is something I concern myself with. With a single code, you have:

Using an interned string for the code means that the three pointers will remain, but at least I won't have a huge number of three- and two-character strings in memory.

Singleton

I still wanted to potentially reduce the memory pressure even further. To do this, I created a singleton class `LanguageCodeManager` which provides a singleton access to the LanguageCode.

var manager = LanguageCodeManager.Instance;
var english1 = manager.Get("eng");
var english2 = manager.Get("en");
var english3 = manager.GetIsoAlpha3("eng");
var english4 = manager.GetIsoAlpha3T("eng");

This way, you'll only have one instance of “English” regardless of how many pointers you use. Of course, if you also decide to manually create an English tag, it will continue to compare against the singleton version even though it is a separate object.

I made `LanuageCodeManager` an injectable singleton to provide for customizations.

LanguageCodeManager.Instance = new LanguageCodeManager();
LanguageCodeManager.Instance.Add(new LanguageCode("xmi")); // Miwāfu
LanguageCodeManager.Instance.Add("xlo"); // Lorban

Assert.AreEqual(2, LanguageCodeManager.Instance.Count);

foreach (LanguageCode lc in LanguageCodeManager.Instance)
{
	Assert.IsNull(lc.IsoAlpha2);
}

This also means that most methods that use language codes actually take a `LanguageCodeManager` as a parameter to facilitate testing and isolation. So far, I found that this adds a bit of overhead with many functions but I think it gives the flexibility needed; I'm in the process of converting most of those to argument objects to simplify the process.

The default `LanguageCodeManager` does not have any of the ISO codes. It is an empty list of codes. To add the ones stored as a manifest resource, you can use `AddDefaults()` to include them. The initially created `LanguageCodeManager` has these defaults already added.

Why a class?

I decided to make `LanguageCode` a class despite the overhead of the class mainly to make it easy to pass `null` in. Also because if I used a struct, then the item would have at least three string pointers everywhere it is used instead of a single one.

Why not string?

The main reason I just didn't leave this as a string is because of type-safety. I like passing in a language code when it is suppose to be a language code and not worry that one of the five different strings is suppose to be the three-character code. Or if it is suppose to be a two-character. Or something else.

var english = LanguageCodeManager.Instance.Get("eng");
var translation = GetTranslation(english, "bob");

Special

There is one `LanguageCode` that doesn't fit with the ISO standard, “Canonical”. This has a code of `*` for all of the fields and is used to do the final matching or determine the canonical name of something.

var canonical = LanguageCode.Canonical;

Assert.AreEqual("*", canonical.IsoAlpha3);

Names of Languages

One aspect of the language code that is not included in the object is the name of the language. This led into one of the more complicated parts of the library, and one of the ones I'm most unsure about, but that requires me to have country codes and language tags to explain.

Self-review

An interesting aspect about writing up this page is that I found things wrong with my API. For example, I had `LanguageCode.Alpha3` when it really should have been `LanguageCode.IsoAlpha3`. It is a simple change, but writing this was a way of stepping back and looking over it again.

Metadata

Categories:

Programming

Tags:

mfgames-culture

mfgames-culture-cil

Footer

Below are various useful links within this site and to related sites (not all have been converted over to Gemini).

Contact

Biography

Bibliography

Fiction

Fedran

Coding

Categories

Tags

Privacy

Colophon

https://d.moonfire.us/blog/2015/02/17/mfgames-culture-api-languages/