Greek to english alphabet translation

You will need to build the mapping of Unicode characters into latin characters which they resemble. Since the encoding that turns "the Family" into "tђє Ŧค๓เℓy" is effectively random and not following any algorithm that can be explained by the information of the Unicode codepoints involved, there's no general way to solve this algorithmically. The lookup arrays (as it was in our case) took perhaps 1 man day to produce, to cover all diacritic marks for all Western European languages. This comes from experience of having worked on an application that was required to allow end users to search bibliographic data that included diacritic characters. If your language supports native Unicode characters (as Java does) and optimises static structures correctly, such find and replaces tend to be blindingly quick. It then becomes a small amount of slightly boring work creating the dictionaries, and a trivial task to perform the replacement. Similarly, the Unicode character set contains hundreds of mathematical and pictorial symbols: there is no (easy) way for users to directly enter these, so you can assume they can be ignored.īy taking these logical steps you can reduce the number of possible characters to parse to the point where a dictionary based lookup / replace operation is feasible. Consider the source of the input - if you are coding an application for "the Western world" (to use as good a phrase as any), it would be unlikely that you would ever need to parse Arabic characters. If you must, for whatever reason, convert characters, then the only sensible way to approach this it to firstly reduce the scope of the task at hand. This is before you even go onto consider the Cyrillic languages and other script based texts such as Arabic, which simply cannot be "converted" to English.

As others have pointed out, diacritics are there for a reason: they are essentially unique letters in the alphabet of that language with their own meaning / sound etc.: removing those marks is just the same as replacing random letters in an English word. Attempting to "convert them all" is the wrong approach to the problem.įirstly, you need to understand the limitations of what you are trying to do.