次へ: Model of the Environment 上へ: raut04ASJ03 戻る: Abstract

Introduction

The performance of speech recognizers trained with clean speech degrades when used in noisy environment, due to mismatch between training and testing conditions. Parallel Model Combination (PMC) [6] has been an effective method to cope with this robustness issue. PMC estimates the model for noisy acoustical environment by combining clean HMM and noise HMM, and thus reduces the mismatch between training and testing conditions. However, an accurate estimation of model parameters involves numerical integration, which is computationally very expensive. Data-driven PMC [5], that is based on generating samples of corrupted speech vectors by Monte-Carlo simulation, is sufficiently accurate compared to numerical integration, but still slow. Other approximations such as log-normal, log-add and log-max are computationally efficient, but less accurate [4]. PMC log-normal approximation, that is most commonly used, assumes that the sum of two log-normally distributed random variables is itself log-normally distributed [6].

Vector Taylor Series (VTS) [1,2] is yet another approach to combine the models by approximating the non-linear relationship between speech and noise with a truncated vector Taylor series. However, other polynomials optimized to approximate the parameters of distribution can give better result than the Taylor series [2].

In this paper, we approximate the non-linear function governing the relationship between speech and noise by a Lagrange polynomial, and then estimate the model parameters for noisy speech. The accuracy of the approximation is compared with other methods, and the performance of speech recognizers with Lagrange polynomial approximation is also evaluated.

次へ: Model of the Environment 上へ: raut04ASJ03 戻る: Abstract

平成16年4月23日