Parallel Model Combination (PMC) [4,5] has been an effective method to cope with the robustness issue, and has been extensively studied. Many variations of PMC exist that attempt to estimate noise-adapted model from clean speech HMM and noise HMM. However, an accurate estimation of model parameters by PMC involves numerical integration, which is computationally very expensive. Data-driven PMC , that is based on generating samples of corrupted speech vectors by Monte-Carlo simulation, is sufficiently accurate compared to numerical integration, but still slow. Other approximations such as log-normal, log-add and log-max are computationally efficient, but less accurate . Further, PMC log-normal approximation, that is most commonly used, assumes that the sum of two log-normally distributed random variables is itself log-normally distributed .
Jacobian approach to model adaptation, proposed by Sagayama et al. in , attempts to compensate model by Jacobian matrices with the difference between assumed and observed noise cepstra. However efficient and effective, the method requires some training data, and assumes that cepstral difference and variance of mixtures stay within the linearity range.
Use of neural network (NN) for combining clean speech HMM and noise HMM has been investigated in . Neural networks are used to learn the non-linearity involved in combination, and thus to produce noise-adapted HMM, by using clean speech HMM, noise HMM and SNR as inputs. Neural networks need to be trained first, using a set of input and output HMMs. Output HMM for training is obtained by a combination of MLLR, MAP and VFS adaptation techniques for a particular combination of inputs, viz. clean speech HMM, noise HMM and SNR. The method has been found to be effective, however it involves building large number of sample output noisy HMMs and training of NNs, that is slow, computationally inefficient and a tedious task.
Vector Taylor Series (VTS) [10,11] is yet another approach to combine the models by approximating the non-linear relationship between speech and noise with a truncated vector Taylor series. However, other polynomials optimized to approximate the parameters of distribution can give better result than the Taylor series .
In this paper, we approximate the non-linear function governing the relationship between speech and noise by a Lagrange polynomial, and then estimate the model parameters for noisy speech. The accuracy of the approximation is compared with other methods, and the performance of the Lagrange Polynomial Approximation method is also evaluated on a speaker-dependent isolated word recognition task.