Quote:
|
Originally Posted by sevent
Thanks for the info! I did a check of a name and it converted fine to the native looking characters but the script was labeled:
CJKUnifiedIdeographs
Does that seem right? Is there a way to find out from this info what language you are really talking about (ie. Chinese simplified)?
|
There are unicode-ranges which are used for several languages (latin,kanji,romaji etc.), if each char of a string is within the overlapped range it is impossible to determine the language directly. One would have to rely on char-groups, char positions etc., the statistical occurrence of a certain combination would then determine the probabilities of the different languages.
It's possible but it requires quite an effort into the relevant languages if you want to pull it off.