Noise environment

Both DPMC (see dpmcCompensate) and the approximation of the cross-entropy require samples of the corrupted speech.

class noise.corrupted_speech_sampler.CorruptedSpeechSampler(speech, noise, phaseFactor, uniformUnitSampler)

Sampler that produces corrupted speech samples by drawing from the distributions of speech, noise, and phase factor and combining these samples. The speech sampler may return a tuple of a state and a speech vector. In this case a tuple of a state and a corrupted speech vector will be returned.

Parameters:
  • speech – the speech distribution.
  • speech – the noise distribution.
  • speech – the phase factor distribution.

Mismatch function

The mismatch function for noise-robust speech recognition is a function relating the speech x, noise n, phase factor \alpha, and the noise-corrupted speech y. Additionally, to transform the integral in the likelihood expression, this toolkit introduces a substitute variable u. These are related by

\exp (y) &= \exp (x) + \exp (n) + 2 \alpha \exp \Big( \frac12 x + \frac12 n \Big);
\\
u &= n - x.

Since these are deterministically related, each variable can be found from setting three (or two) of the other variables. The following functions convert between variable settings (not all combinations of variable settings are supported. Trying any of these will trigger an assert (False)).

noise.noise_environment.speechFrom(noise=None, observation=None, substitute=None, phaseFactor=None)
Returns:the speech vector that follows from setting three of the other variables.

If phaseFactor is not given, it is assumed zero.

noise.noise_environment.noiseFrom(speech=None, observation=None, substitute=None, phaseFactor=None)
Returns:the noise vector that follows from setting three of the other variables.

If phaseFactor is not given, it is assumed zero.

noise.noise_environment.observationFrom(speech=None, noise=None, substitute=None, phaseFactor=None)
Returns:the observation vector that follows from setting three of the other variables.

If phaseFactor is not given, it is assumed zero.

noise.noise_environment.substituteFrom(speech=None, noise=None, observation=None, phaseFactor=None, substituteSign=None)
Returns:the substitute vector that follows from setting two of the other variables. If the phase factor is not given, this solves a quadratic equation. The substitute sign may have to be given to constrain the result.

If phaseFactor is not given, it is assumed zero.

Phase factor

The phase factor is an aggregate for the effect of phase differences between speech and noise in the log-spectral domain (see Deng et al, 2004). Since log-spectral representations of the speech and noise discard phase information, the phase factor is a random variable. It is approximately Gaussian distributed.

class noise.phase_factor_gaussian.PhaseFactorGaussianSampler(phaseFactorGaussian, uniformUnitSampler = UniformUnitSampler())

Gaussian sampler, but constrain results to [-1, +1].

noise.phase_factor_gaussian.melFilterVariance(channel)

Return the approximate variance of the phase factor for the channelth Mel filter.

This approximation is due to Leutnant et al. (2009).

speech.mel_filter.melFilter(channel)
Returns:(as an iterator) a list of (frequency, weight) that indicates

the contribution of the frequencyth spectral coefficient to the channelth Mel filter.

Cepstrum

This toolkit uses the log-spectral domain, which is related to the more standard cepstral domain by a linear transformation.

speech.dct.dct(size)
Returns:the DCT (DCT-II) matrix.
speech.diagonalise_cepstral.diagonaliseCepstral(g)

Diagonalise, in the cepstral domain, a covariance in the log-spectral domain. This is useful to imitate diagonalisation in the cepstral domain when working in the log-spectral domain.

This converts the covariance to the cepstral domain with a square DCT matrix, diagonalises the covariance, and converts it back with the transpose of the DCT matrix. Note that this does not reduce the dimensionality of the cepstral covariance, because that would produce a non-full rank covariance in the log-spectral domain.

Parameters:g – covariance in the log-spectral domain.
Returns:covariance in the log-spectral domain that is g diagonalised in the cepstral domain.

Table Of Contents

Previous topic

Transformed-space sampling

Next topic

Promise

This Page