PDA

View Full Version : IBM, C-DAC Partner to Promote Hindi


Rubber Duck
20th August 2007, 10:39 PM
Centre for Development of Advanced Computing (C-DAC) and IBM India Research Laboratory have announced development of "Shrutlekhan-Rajbhasha", a Hindi speaker independent, continuous speech recognition system that uses IBM Desktop Hindi Speech Recognition technology.

"Shrutlekhan-Rajbhasha" is part of C-DAC's ongoing MANTRA-Rajbhasha project for ensuring a high level of accuracy in the translation of English to Hindi, in areas including administration, finance, agriculture, and small scale industry.
The MANTRA-Rajbhasha project is sponsored by the Department of Official Language, Ministry of Home Affairs, Government of India.

IBM's Desktop Hindi Speech Recognition technology understands and transcribes human speech with little use of keyboards, thereby helping people unfamiliar with computers or the Hindi language. Also as there are various keyboards in use for Indian languages, speech recognition eliminates the need to learn mapping of different keyboards.

IBM says the technology has been tried and tested for variations across a large number of speakers from different regions of the county. Meanwhile, C-DAC says it has collected a large amount of data from users across different states to capture variations in dialect. The system's spellchecker for Hindi has been built after taking into consideration all this data so as to make it more accurate.

"Shrutlekhan-Rajbhasha" has also been integrated with other user-friendly features like the ability to convert Unicode text into ISFOC fonts, etc.

Dr Daniel Dias, director of IBM India Research Laboratory, said, "Shrutlekhan-Rajbhasha has been developed by C-DAC for promoting the use of Hindi for official and other purposes, and is an example of how IBM speech recognition technology might be used to make people's lives easier."

Shri S Ramakrishnan, director general of C-DAC, said, "the coming together of IBM and C-DAC is a step forward towards enabling speech technologies to benefit a sizeable portion of the Indian population."

http://www.techtree.com/India/News/Speech_Recognition_Uses_IBM_Technology/551-82806-580.html


IBM’s India Research Laboratory (IRL) has developed a speech recognition software for Hindi, one of the key languages in India.

The software has both commercial applications and social applications such as bridging the digital divide, Daniel Dias, director of the lab, said in a telephone interview on Thursday.

The Indian government and other local agencies have been promoting the use of local languages in computing, but the development of an input device for Indian languages has proven to be quite difficult. The Devnagri script used in Hindi has over 40 basic characters, and some 12 modifiers to the characters that are represented above or below the basic characters.

There are a number of keyboards available for the Devnagri script, but to input one character of the script, the user has to punch a combination of keys, said Ashish Verma, a senior researcher at IRL, and the lead on this project.

Hewlett-Packard’s lab in India has designed a touch-pad, which it calls the "gesture keyboard", which uses a combination of tapping and gestures to handle Hindi. The touch-pad has the basic characters and numbers of the Devnagri script on it. The character with the required modifier can be input into the computer by specific user gestures when tapping the basic character with a pen-based input device.

IBM, by contrast, has opted for a technology based on speech recognition, both because it is simpler, and also because it can be used by the large number of Indians who are semi-literate or not familiar with a computer keyboard. The dictionary, developed by IRL for the speech recognition system, has over 75,000 words in Hindi, with a provision to add new words, Verma said.

One of the challenges in developing a speech recognition system for Hindi was that words in the language are often pronounced quite differently in various parts of the country. "We had to come up with multiple pronunciations for a given word in Hindi, and include them in the dictionary, and get them recognized (by the system) in the testing phase," Verma said.

The core technology, developed by IRL, can be used in PC applications such as data entry, letter-writing, sending emails, as well as to speech-enable ATMs (automated teller machines), kiosks and other devices, Dias said. The software can also be used for issuing commands to the computer, and for IVR (interactive voice response) applications in telephony, he added. As the software supports Unicode it can be integrated with a number of word processing and email applications including from Microsoft, Verma said.

The software produces word recognition rates in the range of 90-95 percent with speaker adaptation and 80-90 percent without speaker adaptation. The accuracy is close to 100 percent for ‘command and control’ kind of applications, such as operating ATMs and kiosks, since the vocabulary is limited in these applications, IBM said.

The technology is being integrated by the Center for Development of Advanced Computing (C-DAC), an Indian government run research organization, into its programs for facilitating computing in Indian languages. IBM may also make this technology available to other markets, including for commercial applications. The company has not yet decided on the business model, Dias said.

http://www.infoworld.com/article/07/08/16/IBM-speech-recognition-in-Indian-language_1.html