The baseline system comprised of 41 context-independent continuous-density phone HMMs with single-mixture 123 states and 26 dimensional vector composed of static and delta MFCC with energy and delta-energy coefficients. Julian 3.4 was used as the decoder. The baseline word recognition accuracy for clean speech was 93.8%.
Exhibition noise from JEITA database was added to test data at 0 dB, 5 dB, 10 dB, 20 dB and 40 dB SNRs. The word recognition accuracy reduced to 2.8% at 0 dB SNR, when recognized with clean HMMs.
Recognition were then performed with the models adapted by Lagrange polynomial approximation. The models were adapted only for 12 static mean parameters.
Figure 3 shows word recognition accuracies at different SNRs obtained with various models. The matched models were built by training HMMs from training data corrupted by noise at given SNRs. In case of PMC log-normal approximation, both means and variances of 12 static parameters were adapted. As seen in the figure, the performance obtained with Lagrange polynomial approximation based model adaptation is close to that obtained with matched models, at high SNR; and is significantly improved compared to PMC log-normal approximation at low SNR.