Return to ATK Home Page



Acoustic Models

Front End Decoding


How does an ATK model set differ from a HTK model set?

ATK and HTK models are compatible.  However, each of ATK and HTK have certain types of macros which are specific to themselves.  In particular, ATK is able to perform on-the-fly triphone synthesis so that you can add new words to the dictionary without worrying whether or not the new word requires triphones which are not in the current hmmlist.   It does this by attaching the tree file dumped by HHEd as a macro.   Early versions of ATK used ~b for this macro but this was changed in version 1.5 and later to ~q to avoid a clash with HTK's use of ~b for an adaptation matrix. 

What is a Background HMM (BGHMM) used for?

A background HMM is used only to improve the precision of confidence scores.  It does not affect recognition accuracy and it can be omitted if confidence scoring is not critical to the application.  It consists of a standard HMM consisting of 1 or more emitting states.  When used in recognition, only the state output distributions are used.  At each frame, the recogniser computes the probability of the current frame given each BGHMM state and uses the most likely state to provide the background normalisation score (see section 6.3.2 on confidence scoring in the ATK Manual)

How do I build a Background HMM (BGHMM)?

All that is needed for a background HMM is to create a single fully connected HMM.  Then train it as though it was an isolated word and each training utterance represented an instance of that word.  Thus, if you had training data files s1.wav, s2.wav, ... , sN.wav listed in an scp file bghmm.scp. You would do the following:

a) create a fully connected HMM called "bghmm" in a macro file called "BGHMM" with N states.  Each output probability should have a mean of 0 and a variance of 1.  Also, create a global macro file "BGFLOOR" containing the parameter definitions and floor macro used by your main model set.

b) create an MLF file called bghmm.mlf containing

this will be needed by HERest to tell it that every training file is an instance of bghmm.

c) train bghmm first using HInit and HRest eg as in

HInit -C config -S bghmm.scp -i 4 -u mv -M bghmm1 -H BGFLOOR -H BGHMM bghmm

d) then use HERest and HHEd to upmix the HMM to the required number of mixture components.  For example, the following invocation of HHEd and HERest would take the model in directory bghmm3, upmix according to the HHEd script in mixup.hed and then reestimate the upmixed model in bghmm4 storing the reestimated result in bghmm5

HHEd -C config -H BGFLOOR -H bghmm3/BGHMM -M bghmm4 mixup.hed
HERest -C config -S bghmm.scp -u mv -I bghmm.mlf -M bghmm5 -H BGFLOOR -H bghmm4/BGHMM

The number of states required and the number of mixtures per state should be determined by experiment, but a good starting point is a model with around 8 states and at least twice as many mixture components per state as are used in the acoustic models.

How do I generate a Cepstral Mean Normalisation Vector?

Use HCompV during acoustic model training to compute the global mean of the data.

When is the Cepstral Mean Normalisation vector reset?

By default the CMN mean vector is reset everytime that HCode's parmbuf is reset.  This means that in AVite, for example, the CMN mean is reset before every new file is processed.   This behaviour can be modified by setting the configuration variable HPARM:CMNRESETONSTOP to false.  The CMN mean vector can also be reset explicitly by sending the message "cmnreset" to the appropriate ACode component.

How are feature vectors constructed?

ATK uses a modified version of HParm to code waveform packets received from ASource into observation packets.  These observation packets simply hold a HTK Observation record.   On creation, an ACode object creates a HParm input channel (by calling CreateExtSrc) and then creates a HTK ParmBuf record which holds a reference to this input channel and all of the HTK feature settings extracted from the config file (stored in IOConfig cf)    The input channel itself consists of the callback routines provided by ACode (esp xGetData). 

When ACode is started it simply collects incoming wave packets and keeps calling HParm's ReadBuffer routine.  This in turn drags waveform samples into HParm via the call back routine xGetData and returns filled observations.

How does ARec check that incoming observations are compatible with the loaded HMMs?

The simple answer is that since the coder and the recogniser run in separate threads, ARec cannot check this.  However, ATK makes it easy for applications to do the check before they fire up the coder and recogniser threads.  An example of the code to do it can be found in AVite.

AObsData *od = acode->GetSpecimen();
if (!hset->CheckCompatible(&(od->data))){
   HRError(3221,"AVite: HMM set is not compatible with Coder");
   throw ATK_Error(3221);

This should be called after creating the various components (coder, recogniser and hmmset) but before starting them running.  The GetSpecimen method provides a dummy observation packet and the CheckCompatible method checks that the various internal observation features are compatible with the hmm set.