IDN Forums - Internationalized Domain Names

IDN Forums - Internationalized Domain Names (http://www.idnforums.com/forums/)
-   General Discussion (http://www.idnforums.com/forums/general-discussion/)
-   -   Verisign Encoding Issue (http://www.idnforums.com/forums/5107-verisign-encoding-issue.html)

domainstosell 14th July 2006 10:54 PM

Verisign Encoding Issue
 
On a couple of other posts: http://www.idnforums.com/forums/4994...i-domains.html

and

http://www.idnforums.com/forums/2657...g-mystery.html

we were discussing the fact that you can get an incorrect registration, or "lookalike" domain in languages where characters can be entered in different order to produce the same character.

Dynadot contacted Verisign on my behalf to find out about this. The response is below for anyone who is interested. Their response isn't exactly what I had hoped to hear. The main problem now will be as blastfromthepast said in his post:

Quote:

Originally Posted by blastfromthepast
1. Google is not combining results for differently-ordered text in Indic scripts - this applies to any script where you can enter characters in a different order to produce the same character. This should be resolved by google in the future. Maybe we should let them know.



QUESTION TO VERISIGN
>=====================================
>
>We are trying to convert these four unicode characters to an IDN:
>
>099C 09C1 09DF 09BE
>
>We get:
>
>xn--w5b2bybim
>
>But when we convert xn--w5b2bybim back into unicode we get:
>
>099C 09C1 09AF 09BC 09BE
>
>Why does four characters convert to 5 characters? The language is Bengali or Hindi. Please see:
>http://www-950.ibm.com/software/glob...u09BE&x=45&y=6

>
>
>RESPONSE FROM VERISIGN
>=====================================
>
>As part of the IDNA conversion process, Nameprep defines the rules for how to handle particular characters. In this case the Bengali letter "yya" is being changed during the nameprep process from 9df to 9af 9bc. Effectively nameprep takes a single character and breaks them into two components. Visually this does not matter, for on the decode they are rendered properly in the browser; it is just the way that the encoding process is done.
>
>
>QUESTION TO VERISIGN
>=====================================
>
>Thank you for your response. It makes sense.
>
>I checked the domain xn--w5b2bybim, but it displayed 5 characters instead of 4.
>http://www.dynadot.com/domain/search...=xn--w5b2bybim

>
>
>RESPONSE FROM VERISIGN (also see attachment)
>=====================================
>
>Thank you for contacting VeriSign Customer Service.
>
>We went and checked on the website and we attached the screenshot for your review to insure that we're seeing the same thing. Unfortunately we do not speak Bangla and as such visually we cannot tell if the domain has been changed.
>
>With all of that said IDNA standard has specified that for this character it must be broken up in the manner and it cannot be changed. A good example of this happens in the German language where the "sharp S" (ß) is normalized or name prep'ed to "ss".
>
>The specific component of nameprep that is used is something called "string prep" and if you want to get the specific details you can review: http://www.ietf.org/rfc/rfc3454.txt you'll notice that your character is listed there.

blastfromthepast 15th July 2006 01:40 AM

Re: Verisign Encoding Issue
 
1. The devanagari encoding mystery has been solved: some registrars were not doing nameprep correctly. New registrations are now going through correctly. If you purchased the domain early in the game, (like Laura Snow did with Sari) you may have gotten an incorrect domain. New typins should resolve to the correct domain if the browser also implements nameprep the right way, which they should. Google searching is another matter. They need to combine results for what we can term identical Indic words with different character order of entry. They will in the future, just like they combine results for cafe and café if you search without ++"".

2. I'm not sure about your Bengali issue. It may be that in this case nameprep is not being correctly used like example 1. It may also be an error in nameprep, in that case, it is not Verisign's problem. Verisign deals with .com, .net., TLDs. Punycode is a technology standard that was approved for the whole internet. It is necessary to focus the example and use the best punycode converters to investigate. If you still find an error that can be duplicated, share it, and let one of the Indian IT specialists on this forum bring it to ICANN's attention, if this is indeed an encoding problem.

Giant 15th July 2006 02:49 AM

Re: Verisign Encoding Issue
 
Quote:

Originally Posted by blastfromthepast
1. The devanagari encoding mystery has been solved: some registrars were not doing nameprep correctly. New registrations are now going through correctly. If you purchased the domain early in the game, (like Laura Snow did with Sari) you may have gotten an incorrect domain. New typins should resolve to the correct domain if the browser also implements nameprep the right way, which they should. Google searching is another matter. They need to combine results for what we can term identical Indic words with different character order of entry. They will in the future, just like they combine results for cafe and café if you search without ++"".

2. I'm not sure about your Bengali issue. It may be that in this case nameprep is not being correctly used like example 1. It may also be an error in nameprep, in that case, it is not Verisign's problem. Verisign deals with .com, .net., TLDs. Punycode is a technology standard that was approved for the whole internet. It is necessary to focus the example and use the best punycode converters to investigate. If you still find an error that can be duplicated, share it, and let one of the Indian IT specialists on this forum bring it to ICANN's attention, if this is indeed an encoding problem.

The punycode scheme still does not encode all characters yet, maybe it never will, because they continue adding characters to the Unicode table. This is not "problem", that's the way we design formulars. More and more characters will be covered, but send ICANN or the punycode team what you discovered to save them time. The best solution is NOT to reg such characters, the chance you will lose them is high.


All times are GMT. The time now is 09:54 PM.

Powered by vBulletin® Version 3.8.4
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.3.0
Copyright idnforums.com 2005


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54