View Single Post
  #1 (permalink)  
Old 14th July 2006, 10:54 PM
domainstosell's Avatar
domainstosell domainstosell is offline
Senior Member
 
Join Date: Apr 2006
Location: USA
Posts: 1,203
iTrader: (7)
Rep Power: 891
domainstosell is on a distinguished roaddomainstosell is on a distinguished roaddomainstosell is on a distinguished roaddomainstosell is on a distinguished roaddomainstosell is on a distinguished roaddomainstosell is on a distinguished road
Post Verisign Encoding Issue

On a couple of other posts: http://www.idnforums.com/forums/4994...i-domains.html

and

http://www.idnforums.com/forums/2657...g-mystery.html

we were discussing the fact that you can get an incorrect registration, or "lookalike" domain in languages where characters can be entered in different order to produce the same character.

Dynadot contacted Verisign on my behalf to find out about this. The response is below for anyone who is interested. Their response isn't exactly what I had hoped to hear. The main problem now will be as blastfromthepast said in his post:

Quote:
Originally Posted by blastfromthepast
1. Google is not combining results for differently-ordered text in Indic scripts - this applies to any script where you can enter characters in a different order to produce the same character. This should be resolved by google in the future. Maybe we should let them know.


QUESTION TO VERISIGN
>=====================================
>
>We are trying to convert these four unicode characters to an IDN:
>
>099C 09C1 09DF 09BE
>
>We get:
>
>xn--w5b2bybim
>
>But when we convert xn--w5b2bybim back into unicode we get:
>
>099C 09C1 09AF 09BC 09BE
>
>Why does four characters convert to 5 characters? The language is Bengali or Hindi. Please see:
>http://www-950.ibm.com/software/glob...u09BE&x=45&y=6

>
>
>RESPONSE FROM VERISIGN
>=====================================
>
>As part of the IDNA conversion process, Nameprep defines the rules for how to handle particular characters. In this case the Bengali letter "yya" is being changed during the nameprep process from 9df to 9af 9bc. Effectively nameprep takes a single character and breaks them into two components. Visually this does not matter, for on the decode they are rendered properly in the browser; it is just the way that the encoding process is done.
>
>
>QUESTION TO VERISIGN
>=====================================
>
>Thank you for your response. It makes sense.
>
>I checked the domain xn--w5b2bybim, but it displayed 5 characters instead of 4.
>http://www.dynadot.com/domain/search...=xn--w5b2bybim

>
>
>RESPONSE FROM VERISIGN (also see attachment)
>=====================================
>
>Thank you for contacting VeriSign Customer Service.
>
>We went and checked on the website and we attached the screenshot for your review to insure that we're seeing the same thing. Unfortunately we do not speak Bangla and as such visually we cannot tell if the domain has been changed.
>
>With all of that said IDNA standard has specified that for this character it must be broken up in the manner and it cannot be changed. A good example of this happens in the German language where the "sharp S" (ß) is normalized or name prep'ed to "ss".
>
>The specific component of nameprep that is used is something called "string prep" and if you want to get the specific details you can review: http://www.ietf.org/rfc/rfc3454.txt you'll notice that your character is listed there.
__________________
GasStations.org
Reply With Quote