Quote:
|
Originally Posted by fd99392
when google maps it for searches it converts it into puny code and sees them only as a punycode, and the puny code for the first one is different from the second one, so different results
|
Google doesn't deal with punycode. Google searches for unicode text in utf-8 encoding. Since there appear to be two ways to enter this text in, google could, and should, combine the results.
When unicode is converted to punycode to create a domain name, it is supposed to be normalized so that such problems don't occur.
[quote=a2zofb2b]Looks like it could be a major problem.
Here are 2 variations and the unicode sequence for the same.
xn--e2b9bngm.com (साड़ी.com): = स ा ड़ ी
and
xn--12bmg5i.com (साड़ी.com): = स ा ड ़ ी[/quote]
That is it. Thanks for the explaination.
I tried putting in both into IBM's punycode converter, and I get identical results (xn--e2b9bngm). If registrars implement the punycode conversion mechanism correctly then this shouldn't be a problem. Looks like some registrars had it wrong and weren't running the unicode through the nameprep routine and some people are now stuck with domain lookalikes because of errors in their registrars punycode conversion.
[url]http://www-950.ibm.com/software/globalization/icu/demo/domain?t=test&x=22&y=17[/url]