Posts Tagged ‘ICU’

Using the ICU4C Transliterator

Thursday, August 28th, 2008

Recently, while developing a small application, I needed to strip all accents from strings, regardless of the strings’ encoding. I found multiple hints on the web, but none that were really useful (such as performing a search-and-replace on all possible accentuated characters), until I found an obscure post somewhere that suggested the use of ICU (International Components for Unicode) to perform this task.

It seemed that the ICU “Transliterator” could do this automatically, with the proper transliteration rule. However, the use of ICU4C is not particularly well documented on the web (at least, not the use of the Transliterator C++ class). After spending a few hours trying to get it to work, it finally worked. So here are my findings.

(more…)