The MIL Speech Seminar series schedule for the Long Vacation 2008 was as follows:
26th June 2008 | Thomas Schaaf (Multimodal Technologies, Inc) | A Comparison of VTLN and Gender-Dependent Models |
After an introduction of Multimodal Technologies, Inc, Pittsburgh,
PA (M*Modal), l describe the current challenges in dictation based
health care documentation. This will be followed by an overview of
M*Modal's contribution in this space: a unique blend of speech
recognition and natural language processing technologies for turning
conversational dictations of clinical encounters into structured and
encoded clinical documents. Using a centralized, hosted architecture
based on a web services infrastructure, allows us to collect vast
amounts of audio and proof-read textual data, enabling us to make
use of highly speaker-specific models. Rapid adaptation to new
speakers with minimal or no impact on physicians’ workflow is an
important aspect which affects the acceptability of the solution.
One difference between speakers is the variation in the length of
the vocal tract. It is well established that this can be partially
compensated for with gender-dependent or vocal-tract-normalized
acoustic models. I will present several ways of building gender-
dependent models by splitting the database along the gender or the
usage of a gender question in the context cluster tree. This is then
compared with Vocal Tract Length Normalized (VTLN) acoustic models
using data from a Radiology reporting domain. Although gender
dependent models result in considerable gains they did not
outperform VTLN. From a business point of view scalability is an
important issue and in addition to better performance practical
constraints are also in favor of VTLN. For example it is possible to
estimate the VTLN based on a simple Gaussian Mixture Model during
frontend processing allowing a single-pass decoding, and still be
able to adapt quickly if unexpected speaker change occurs. I will
end the presentation with a selection of research topics that arise
from running an automatic transcription service.
|