PDA

View Full Version : Good Test for Viability of ASCII in Latin Languages


Rubber Duck
24th February 2008, 03:36 PM
Simple. Just look to see how the dictionary is divided up.

If you dictionary is categorised into 26 letters, then the accents are probably not that essential.

If you get hold of a dictionary and the words are catalogued, not only according to the 26 ASCII letters, but also according to what we might categorise as accents, then then there is a real problem. In Czech, this categorisation ignores the accents that give long vowels, but many of the others are counted as being entirely separate letters. One Letter is written CH. We might treat that as two but to them it is still only one and it comes between H and I in the dictionary. OK, that alone would not justify IDN, but it does clearly indicate how they do not recognise the concept of a 26 Letter Alphabet.

So how many Non 26 Letter Alphabets can we identify that would justify IDN?

French is out. As far as I know they categorise same as English.

Czech is definitely not a 26 Letter Alphabet. Any more?

mdw
24th February 2008, 03:48 PM
Test might be applicable to Vietnamese too, although not actually latin it is derived from French. I wish I had a Vietnamese dictionary handy...

jacksonm
24th February 2008, 04:04 PM
Simple. Just look to see how the dictionary is divided up.

If you dictionary is categorised into 26 letters, then the accents are probably not that essential.

If you get hold of a dictionary and the words are catalogued, not only according to the 26 ASCII letters, but also according to what we might categorise as accents, then then there is a real problem. In Czech, this categorisation ignores the accents that give long vowels, but many of the others are counted as being entirely separate letters. One Letter is written CH. We might treat that as two but to them it is still only one and it comes between H and I in the dictionary. OK, that alone would not justify IDN, but it does clearly indicate how they do not recognise the concept of a 26 Letter Alphabet.

So how many Non 26 Letter Alphabets can we identify that would justify IDN?

French is out. As far as I know they categorise same as English.

Czech is definitely not a 26 Letter Alphabet. Any more?


In Finnish, there are 28 letters: A-Z, Ä, and Ö. You never see Ä and Ö spelled as "ae" or "oe" like you sometimes do with German. Both Ä and Ö have their own keys on the keyboard. Using O instead of Ö (or A instead of Ä) is a serious mistake, as another word can actually exist with the other spelling.

In German, there are 30 letters: A-Z, Ä, Ö, Ü, and (ß). You infrequently see Ä, Ö, or Û typed as "ae", "oe", or "ue" - mainly when somebody doesn't have the right keyboard available. You will never see Ä/Ö/Û typed as A/O/U because in many cases the diacritics actually distinguish between the singular and plural version of the word. Example "stadt" (city), "städte" (cities). ß just means "ss", and is not used in all cases of "ss" - it's even quite acceptable to not use it at all, although it is preferred for some words.

Both Finnish and German justify IDN, as does any language which uses umlauts.

.

Absalon
24th February 2008, 04:08 PM
Simple. Just look to see how the dictionary is divided up.


So how many Non 26 Letter Alphabets can we identify that would justify IDN?


Czech is definitely not a 26 Letter Alphabet. Any more?



The Swedish/Finnish alphabet has 29 letters. As does the Danish/Norwegian (a few letters differ from the Swedish).

The Icelandic has 32! :o

jacksonm
24th February 2008, 04:17 PM
The Swedish/Finnish alphabet has 29 letters.

Å is not a Finnish letter, Nils! :-)

.

Rubber Duck
24th February 2008, 04:19 PM
In Finnish, there are 28 letters: A-Z, Ä, and Ö. You never see Ä and Ö spelled as "ae" or "oe" like you sometimes do with German. Both Ä and Ö have their own keys on the keyboard. Using O instead of Ö (or A instead of Ä) is a serious mistake, as another word can actually exist with the other spelling.

In German, there are 30 letters: A-Z, Ä, Ö, Ü, and (ß). You infrequently see Ä, Ö, or Û typed as "ae", "oe", or "ue" - mainly when somebody doesn't have the right keyboard available. You will never see Ä/Ö/Û typed as A/O/U because in many cases the diacritics actually distinguish between the singular and plural version of the word. Example "stadt" (city), "städte" (cities). ß just means "ss", and is not used in all cases of "ss" - it's even quite acceptable to not use it at all, although it is preferred for some words.

Both Finnish and German justify IDN, as does any language which uses umlauts.

.

Are both dictionaries classified in this way? Do words starting with umlauts actually get listed in a separate parts of the dictionary? If so these languages really cannot function without IDN. It would be like us doing away with C and replacing it with K and S according to the sound. Actually, it would be even worse because they don't even sound the same!

jacksonm
24th February 2008, 04:23 PM
Are both dictionaries classified in this way? Do words starting with umlauts actually get listed in a separate parts of the dictionary? If so these languages really cannot function without IDN. It would be like us doing away with C and replacing it with K and S according to the sound. Actually, it would be even worse because they don't even sound the same!

Yes, words starting with umlauts are listed after Z in the dictionary for Finnish. A word like kärsi would come after a word like kukko.

Actually, not true for German. I just looked in my German dictionary and a word which starts with ü actually comes after a word which starts with ud. Alphabetically speaking, the ü is considered as "ue".

.

Jay
24th February 2008, 04:47 PM
Test might be applicable to Vietnamese too, although not actually latin it is derived from French. I wish I had a Vietnamese dictionary handy...

It follows the French (and therefore English) lettering.


Good Test for Viability of ASCII in Latin Languages


I like the way you are thinking RD and I think this is generally a good indicator. The only comment I would make is that there are differences among those that follow the English 26 letter alphabet in terms of how dependent they are on accents. Vietnamese language, for example, uses over 95% of words with accents, in contrast to the French and Spanish which use much less. In fact, the Vietnamese have a considerably wider range of accents in their vocabulary than the French and Spanish. For most Vietnamese words, there are totally different meanings of the same word when spelt in ASCII depending on which accents are used and where.

The real test here is could a native understand what is said if the accents aren't used? In the case of French and Spanish, I would think it wouldn't be too hard, while in the case of Vietnamese, it is very difficult. It might even be argued that the Vietnamese are more dependent on using vowels than some of the other languages (such as Swedish) are on using extra letters. I'm not sure, but it is possible.

The other dimension we need to bring in alongside the viability test is the desirability measure, which is no doubt secondary but still needs to be considered. This is how pedantic language groups are in using their own characters even when it might be viable to use ASCII. Are the French more attached to using accents than the Spanish (assuming all is equal on the viability measure)? This is more of a pride thing but could be important in terms of how much different Latin language groups embrace IDN.

Absalon
24th February 2008, 04:55 PM
Å is not a Finnish letter, Nils! :-)

.



Well I know they/you :p (the Finns) have it in their alphabet, but perhaps you just use it for Swedish words and names, that you know better than I. Still it is a proper letter also in Finland.

jacksonm
24th February 2008, 05:04 PM
Well I know they/you :p (the Finns) have it in their alphabet, but perhaps you just use it for Swedish words and names, that you know better than I. Still it is a proper letter also in Finland.


You're right.

What does Wikipedia say?

The "Swedish O", carried over from the Swedish alphabet and not used in Finnish; retained especially for writing Finland-Swedish proper names.


http://en.wikipedia.org/wiki/Finnish_alphabet


.

Absalon
24th February 2008, 05:21 PM
Yep.

Rubber Duck
24th February 2008, 05:54 PM
I think we are arguing over something that can be measured in Ångströms. :p

Absalon
24th February 2008, 06:24 PM
I think we are arguing over something that can be measured in Ångströms. :p


I see even ångström.com is taken, is that you RD? :)

Rubber Duck
24th February 2008, 06:27 PM
I see even ångström.com is taken, is that you RD? :)

No such luck here I am afraid. I never made the grade with Swedish! :)

jacksonm
24th February 2008, 06:28 PM
I think we are arguing over something that can be measure in Ångströms. :p


LOL! No argument really, the quote from Wikipedia explains it all.

Å is not used in Finnish words. However, Swedish is the second official language of Finland - although more Finns speak fluent English than they do Swedish. Swedish speaking population (Swedish as the language which is spoken at home) is about 6% in Finland. The Swedish speaking minority are generally concentrated to areas along the south and west coast (read - they own the better property in Finland and are generally richer). In these places, all street signs, etc are both in Swedish and Finnish, with the Swedish text being large and on top and the Finnish text being small and on the bottom. In addition, in Helsinki all signs are in both languages and the text is usually the same size.

Up here where I live, there isn't a trace of Swedish language to be found.

Finally, Å is on the Finnish keyboard. And you can guess that Å is not one of the "shiny" keys on my laptop's keyboard :-)

.

davnin
24th February 2008, 10:21 PM
Simple. Just look to see how the dictionary is divided up.

If you dictionary is categorised into 26 letters, then the accents are probably not that essential.


the italian dictionary has 26 letters, like the english one.
The only IDN characters in italian are à è ì ò ù (and less used é) and they aren't considered letters of the dictionary but only variants of a, e, i, o, u.
By the way, the words with the accent change their meaning, so the accents are essential (but only in the latter character of the word).

for example:
bacio=kiss
baciò= he/she kissed

eredità=heritage
eredita=he/she inherits

università=university
universita=no meaning

casino=whorehouse (but also big problem)
casinò=casino