次へ: Model of the Environment
上へ: raut04ASJ03
戻る: Abstract
The performance of speech recognizers trained with clean speech
degrades when used in noisy environment, due to mismatch between training
and testing conditions. Parallel Model Combination (PMC)
[6] has been an effective method to cope with this robustness
issue. PMC estimates the model for noisy acoustical environment by
combining clean HMM and noise HMM, and thus reduces the mismatch
between training and testing conditions. However, an accurate estimation
of model parameters involves numerical integration, which is
computationally very expensive. Data-driven PMC [5], that
is based on generating samples of corrupted speech vectors by
Monte-Carlo simulation, is sufficiently accurate compared to numerical
integration, but still slow. Other approximations such as log-normal,
log-add and log-max are computationally efficient, but less accurate
[4]. PMC log-normal approximation, that is most commonly used,
assumes that the sum of two log-normally distributed random variables is
itself log-normally distributed [6].
Vector Taylor Series (VTS) [1,2] is yet another approach to
combine the models by approximating the non-linear relationship between
speech and noise with a
truncated vector Taylor series. However, other polynomials optimized to
approximate the parameters of distribution can give better result than the
Taylor series [2].
In this paper, we approximate the non-linear function governing the
relationship between speech and noise by a Lagrange polynomial, and then
estimate the model parameters for noisy speech. The accuracy of the
approximation is compared with other methods, and the performance of speech
recognizers with Lagrange polynomial approximation is also evaluated.
次へ: Model of the Environment
上へ: raut04ASJ03
戻る: Abstract
平成16年4月23日