View Single Post
  #5  
Old 09-02-2006, 07:16 PM
bramiozo's Avatar
bramiozo bramiozo is online now
Administrator
 
Join Date: Sep 2005
Location: Haarlem
Posts: 1,338
iTrader: (15)
Rep Power: 10
bramiozo is an unknown quantity at this point
Send a message via MSN to bramiozo Send a message via Skype™ to bramiozo
Re: Finding out the language of an IDN

Quote:
Originally Posted by sevent
Thanks for the info! I did a check of a name and it converted fine to the native looking characters but the script was labeled:

CJKUnifiedIdeographs

Does that seem right? Is there a way to find out from this info what language you are really talking about (ie. Chinese simplified)?
There are unicode-ranges which are used for several languages (latin,kanji,romaji etc.), if each char of a string is within the overlapped range it is impossible to determine the language directly. One would have to rely on char-groups, char positions etc., the statistical occurrence of a certain combination would then determine the probabilities of the different languages.

It's possible but it requires quite an effort into the relevant languages if you want to pull it off.
Reply With Quote