Computerized polyglots to serve Beijing Olympics

September 17th, 2007

Before you ask a polyglot is someone who can speak a lot of language. This does always imply virtue in a person. Robert Maxwell was a polyglot but not a decent chap.

For the Olympic Games, Beijing will be offering a multi-lingual computerized information service which will accessed by way of mobile phone and will have speech recognition built in. The answer will be computerized speech for a computer polyglot.

To achieve this level of voice recognition is no mean feat, especially within the limitations of a mobile phone circuilt.

Yet Pan Jielin, who works on employing speech recognition technologies in Olympics-related service, told Xinhua Beijing will be the first city in the world to extensively offer the multi-lingual computerized information service.

(Interesting that he gave a paper in 2000 at the Sixth International Conference on Spoken Language which was held in Beijing. He co-authored Effective Vector Quantization for a Highly Compact Acoustic Model for LVCSR which gives you an idea of the level at which this is working.)

Beijing tourism authorities estimated Beijing will host at least 550,000 foreign and 2.2 million domestic visitors during the Olympic Games.

Pan Jielin, associate director of Thinkit Speech Laboratory at the Chinese Academy of Sciences (CAS) Institute of Acoustics, said, ‘We now only have Chinese and English services, but will expand to other languages including French, German and Spanish.’

The lab has developed the embedded multi-lingual speech recognition engine, which picks up acoustic features of human speech, coverts sound signals to bytes, compares discourses of speakers with various syllables in different languages, and optimizes match-ups from algorithmic processing. In a matter of seconds, speakers could get a response from the system.

Pan Jielin said, ‘The core technology of speech recognition applies to any language if we get big enough speech databases.’

Zhao Qingwei, the lab’s chief technology officer, said, ‘We are very competitive in processing the Chinese language because we’re able to get excellent Chinese databases, including those of dialects.’ He said they bought native-speaking English databases from American companies.

Pan Jielin said, ‘We’re quite confident of recognizing more than 90% of speeches of certain topics, such as road and traffic information, Olympic competition results, Olympic venues information, and weather information.’

This is not the place to raise doubts about the viability of a process but speech recognition from many voices is the last great hurdle for computers. Some systems, already widely in use, have problems with some accents.

For example, a system very widely used in Australia can only recognize the word ‘one’ if it is pronounced ‘wun’ and users have to adapt their speech to suit the machine. It may be easier with a tonal language such as the Cantonese dialect but with English, which has literally countless style of pronunciation for any given set of words, it is a monster hurdle in computing. If it works as suggested it will be one of the biggest legacies of these Olympics.
Source: BeijingReview

[Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]