IDN Forums - Internationalized Domain Names  
Home | Advertise on idnforums | Premium Membership

Go Back   IDN Forums - Internationalized Domain Names > IDN Discussions > General Discussion

General Discussion Feel free to talk about anything and everything in this board.

Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 14th July 2006, 10:54 PM
domainstosell's Avatar
Senior Member
 
Join Date: Apr 2006
Location: USA
Posts: 1,203
iTrader: (7)
Rep Power: 885
domainstosell is on a distinguished roaddomainstosell is on a distinguished roaddomainstosell is on a distinguished roaddomainstosell is on a distinguished roaddomainstosell is on a distinguished roaddomainstosell is on a distinguished road
Post Verisign Encoding Issue

On a couple of other posts: http://www.idnforums.com/forums/4994...i-domains.html

and

http://www.idnforums.com/forums/2657...g-mystery.html

we were discussing the fact that you can get an incorrect registration, or "lookalike" domain in languages where characters can be entered in different order to produce the same character.

Dynadot contacted Verisign on my behalf to find out about this. The response is below for anyone who is interested. Their response isn't exactly what I had hoped to hear. The main problem now will be as blastfromthepast said in his post:

Quote:
Originally Posted by blastfromthepast
1. Google is not combining results for differently-ordered text in Indic scripts - this applies to any script where you can enter characters in a different order to produce the same character. This should be resolved by google in the future. Maybe we should let them know.


QUESTION TO VERISIGN
>=====================================
>
>We are trying to convert these four unicode characters to an IDN:
>
>099C 09C1 09DF 09BE
>
>We get:
>
>xn--w5b2bybim
>
>But when we convert xn--w5b2bybim back into unicode we get:
>
>099C 09C1 09AF 09BC 09BE
>
>Why does four characters convert to 5 characters? The language is Bengali or Hindi. Please see:
>http://www-950.ibm.com/software/glob...u09BE&x=45&y=6

>
>
>RESPONSE FROM VERISIGN
>=====================================
>
>As part of the IDNA conversion process, Nameprep defines the rules for how to handle particular characters. In this case the Bengali letter "yya" is being changed during the nameprep process from 9df to 9af 9bc. Effectively nameprep takes a single character and breaks them into two components. Visually this does not matter, for on the decode they are rendered properly in the browser; it is just the way that the encoding process is done.
>
>
>QUESTION TO VERISIGN
>=====================================
>
>Thank you for your response. It makes sense.
>
>I checked the domain xn--w5b2bybim, but it displayed 5 characters instead of 4.
>http://www.dynadot.com/domain/search...=xn--w5b2bybim

>
>
>RESPONSE FROM VERISIGN (also see attachment)
>=====================================
>
>Thank you for contacting VeriSign Customer Service.
>
>We went and checked on the website and we attached the screenshot for your review to insure that we're seeing the same thing. Unfortunately we do not speak Bangla and as such visually we cannot tell if the domain has been changed.
>
>With all of that said IDNA standard has specified that for this character it must be broken up in the manner and it cannot be changed. A good example of this happens in the German language where the "sharp S" (ß) is normalized or name prep'ed to "ss".
>
>The specific component of nameprep that is used is something called "string prep" and if you want to get the specific details you can review: http://www.ietf.org/rfc/rfc3454.txt you'll notice that your character is listed there.
__________________
GasStations.org
Reply With Quote
  #2 (permalink)  
Old 15th July 2006, 01:40 AM
Veteran
 
Join Date: Feb 2006
Posts: 7,495
iTrader: (65)
Rep Power: 2723
blastfromthepast will become famous soon enoughblastfromthepast will become famous soon enoughblastfromthepast will become famous soon enoughblastfromthepast will become famous soon enoughblastfromthepast will become famous soon enoughblastfromthepast will become famous soon enoughblastfromthepast will become famous soon enoughblastfromthepast will become famous soon enoughblastfromthepast will become famous soon enough
Re: Verisign Encoding Issue

1. The devanagari encoding mystery has been solved: some registrars were not doing nameprep correctly. New registrations are now going through correctly. If you purchased the domain early in the game, (like Laura Snow did with Sari) you may have gotten an incorrect domain. New typins should resolve to the correct domain if the browser also implements nameprep the right way, which they should. Google searching is another matter. They need to combine results for what we can term identical Indic words with different character order of entry. They will in the future, just like they combine results for cafe and café if you search without ++"".

2. I'm not sure about your Bengali issue. It may be that in this case nameprep is not being correctly used like example 1. It may also be an error in nameprep, in that case, it is not Verisign's problem. Verisign deals with .com, .net., TLDs. Punycode is a technology standard that was approved for the whole internet. It is necessary to focus the example and use the best punycode converters to investigate. If you still find an error that can be duplicated, share it, and let one of the Indian IT specialists on this forum bring it to ICANN's attention, if this is indeed an encoding problem.

Last edited by blastfromthepast; 15th July 2006 at 01:47 AM..
Reply With Quote
  #3 (permalink)  
Old 15th July 2006, 02:49 AM
Senior Member
 
Join Date: Dec 2005
Location: Canada
Posts: 1,806
iTrader: (19)
Rep Power: 728
Giant is an unknown quantity at this point
Re: Verisign Encoding Issue

Quote:
Originally Posted by blastfromthepast
1. The devanagari encoding mystery has been solved: some registrars were not doing nameprep correctly. New registrations are now going through correctly. If you purchased the domain early in the game, (like Laura Snow did with Sari) you may have gotten an incorrect domain. New typins should resolve to the correct domain if the browser also implements nameprep the right way, which they should. Google searching is another matter. They need to combine results for what we can term identical Indic words with different character order of entry. They will in the future, just like they combine results for cafe and café if you search without ++"".

2. I'm not sure about your Bengali issue. It may be that in this case nameprep is not being correctly used like example 1. It may also be an error in nameprep, in that case, it is not Verisign's problem. Verisign deals with .com, .net., TLDs. Punycode is a technology standard that was approved for the whole internet. It is necessary to focus the example and use the best punycode converters to investigate. If you still find an error that can be duplicated, share it, and let one of the Indian IT specialists on this forum bring it to ICANN's attention, if this is indeed an encoding problem.
The punycode scheme still does not encode all characters yet, maybe it never will, because they continue adding characters to the Unicode table. This is not "problem", that's the way we design formulars. More and more characters will be covered, but send ICANN or the punycode team what you discovered to save them time. The best solution is NOT to reg such characters, the chance you will lose them is high.
__________________
@

Dot Com is King. IDN.com will soon be king.
@
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT. The time now is 03:48 PM.

Site Sponsors
Your ad here
buy t-shirt
מחיר הזהב

Powered by vBulletin® Version 3.8.4
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.3.0
Copyright idnforums.com 2005

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54