ISCA Distinguished Lecturer Talks
Deep-Learning for Speech and Language Processing
Deep-learning has drastically improved the performance of automated systems in a wide-range of areas including speech and language processing. One of the interesting challenges for speech and language processing is to handle the variable length nature of the data. The length of a waveform, even for the same word sequence, can vary considerably as can the length of word sequence to describe a positive movie review. To address this problem a range of approaches have been developed to enable sequence-to-sequence mappings modelling using deep-learnining. These approaches are based on extending traditional frameworks, hidden Markov models, as well as more recent developments such as attention mechanisms. This talk will review the underlying theory for these approaches and how they are applied to speech and language processing tasks. In particular examples from speech recognition and synthesis will be described.
Low Resource Speech Processing
It is estimated that there are up to 7,000 different languages spoken around the world. Of these 90% are used by less than 100,000 people. It is not economically feasible to develop speech technology for all languages. This has led to an interest in developing approaches that allow the same speech processing system to be applied to many, preferably all, languages. This talk will discuss current approaches to low resource speech processing with particular reference to work carried out on the BABEL and MATERIAL projects, These projects collected data from a wide-range of languages, over 25 in total, with a broad distribution of language types. The application areas addressed by these projects are keyword spotting (KWS) and cross-language information retrieval (CLIR). When considering speech data, both of these tasks start with automatic speech recognition (ASR). Approaches for building ASR systems with limited transcribed data for the target language, as well as approaches for augmenting this data using material from the web, will be described. As these ASR systems will often have high error rates, schemes for minimising the impact of errors on system performance will then be described.
Deep Learning in Non-native Spoken English Learning and Assessment
Over 1.5 billion people worldwide are using and learning English as an additional language. This has created a high and growing demand for certification of learners' proficiency, for example for entry to university or for jobs. Automatic assessment systems can help meet this need by reducing human assessment effort. They can also enable learners to monitor their progress with informal assessment when and wherever they choose. Traditionally automatic speech assessment systems were based on read speech so what the candidate said was (mostly) known. To properly assess a candidate's spoken communication ability, however, the candidate needs to be assessed on free, spontaneous, speech. The text is, of course, unknown in such speech, and we don't speak in fluent sentences. we hesitate and stop and restart. Added to this any automatic system has to handle a wide variety of accents and pronunciations for learners across first languages and highly variable audio recording quality. Together this makes non-native spoken English assessment a challenging problem. To help meet the challenge deep learning has been applied to a number of sub-tasks. This talk will look at some examples of how deep learning is helping to create automatic systems capable of free speaking spoken English assessment.
Mark Gales - Biography
[ Cambridge University |