View Single Post
  #8 (permalink)  
Old 6th May 2006, 06:31 PM
blastfromthepast blastfromthepast is offline
Veteran
 
Join Date: Feb 2006
Posts: 7,495
iTrader: (65)
Rep Power: 2664
blastfromthepast will become famous soon enoughblastfromthepast will become famous soon enoughblastfromthepast will become famous soon enoughblastfromthepast will become famous soon enoughblastfromthepast will become famous soon enoughblastfromthepast will become famous soon enoughblastfromthepast will become famous soon enoughblastfromthepast will become famous soon enoughblastfromthepast will become famous soon enough
Re: Devanagari Encoding Mystery

Quote:
Originally Posted by fd99392
when google maps it for searches it converts it into puny code and sees them only as a punycode, and the puny code for the first one is different from the second one, so different results
Google doesn't deal with punycode. Google searches for unicode text in utf-8 encoding. Since there appear to be two ways to enter this text in, google could, and should, combine the results.

When unicode is converted to punycode to create a domain name, it is supposed to be normalized so that such problems don't occur.

Quote:
Originally Posted by a2zofb2b
Looks like it could be a major problem.

Here are 2 variations and the unicode sequence for the same.

xn--e2b9bngm.com (साड़ी.com): = स ा ड़ ी

and

xn--12bmg5i.com (साड़ी.com): = स ा ड ़ ी
That is it. Thanks for the explaination.

I tried putting in both into IBM's punycode converter, and I get identical results (xn--e2b9bngm). If registrars implement the punycode conversion mechanism correctly then this shouldn't be a problem. Looks like some registrars had it wrong and weren't running the unicode through the nameprep routine and some people are now stuck with domain lookalikes because of errors in their registrars punycode conversion.

http://www-950.ibm.com/software/glob...test&x=22&y=17

Last edited by blastfromthepast; 6th May 2006 at 07:08 PM.. Reason: Automerged Doublepost
Reply With Quote