next up previous
次へ: Conclusion 上へ: Model Composition by Lagrange 戻る: Analysis of the approximation

Experimental results

To evaluate Lagrange polynomial approximation based approach, it was tested on an isolated word recognition task trained with 2620 words of same speaker taken from A-Set of ATR Speech Database. The test set consisted of 655 words from the same speaker taken exclusively from the database.

The baseline system comprised of total 41 context-independent continuous-density single-mixture phone HMMs with 123 states in total, and 26-dimensional vectors composed of 13-MFCCs (with $ C_0$) and their deltas were used as input features. The baseline word recognition accuracy for clean speech was 93.8%, with Julian 3.4 used as decoder.

Exhibition noise from JEITA database was added to test data at 0 dB, 5 dB, 10 dB, 20 dB and 40 dB SNRs. The word recognition accuracy reduced to 2.8% at 0 dB SNR, when recognized with clean HMMs.

Recognition was then performed with the models adapted by Lagrange polynomial approximation. The models were adapted only for static mean parameters.

Figure 5 shows word accuracies at different SNRs obtained with various models. The matched models were built by training HMMs from training data corrupted by noise at given SNRs. In case of PMC log-normal approximation, both means and variances of static parameters were adapted. As seen in the figure, the performance obtained with Lagrange polynomial approximation based model adaptation is close to that obtained with matched models, at high SNR; and is significantly improved compared to PMC log-normal approximation at low SNR.

Figure 3 shows that when $ \mu_n >> \mu_x$ or $ \mu_n <<
\mu_x$ , any of the methods can estimate $ \mu_y$ with sufficient accuracy. However, if $ \mu_n \approx \mu_x$, other methods fail to give accurate estimate, whereas LPA works very well. Therefore, when HMM parameters of noise and speech fall close to each-other during combination, LPA's advantage will be more pronounced. Figure 6 shows the histogram of parameters' means (in log-spectral domain) of speech and noise. In the case shown in Figure 6a, speech means and noise means occur closer to each-other(shaded area) than in case of Figure 6b. Thus improvement obtained with LPA will be more noticeable with HMMs of the case (a) than in case (b), compared to other methods.

図 5: Recognition result with clean model, matched model, and the models adapted by PMC Log-normal and Lagrange Polynomial Approximation (LPA).

図 6: Histogram of Speech and Noise Parameters' means in log-spectral domain

next up previous
次へ: Conclusion 上へ: Model Composition by Lagrange 戻る: Analysis of the approximation