next up previous
次へ: Polynomial-approximation 上へ: Model Composition by Lagrange 戻る: Introduction

Model of the environment

An acoustical model demonstrating the effect of additive noise $ n[m]$ and channel filtering $ h[m]$ over a clean speech signal $ x[m]$ is shown in Figure 1.

The corrupted speech is given by:

$\displaystyle y[m]=x[m]*h[m]+n[m]$     (1)

where $ m$ is sample number. In power spectral domain, the filter-bank energies is given as:
$\displaystyle \vert Y(f)\vert^2\approx \vert X(f)\vert^2\vert H(f)\vert^2+\vert N(f)\vert^2$     (2)
$\displaystyle \Rightarrow \ln\vert Y(f)\vert^2\approx \ln\vert X(f)\vert^2+\ln\vert H(f)\vert^2\ \ \ \ \ {}$      
$\displaystyle {}+\ln\Bigg(1+e^{\ln\vert N(f)\vert^2-\ln\vert X(f)\vert^2-\ln\vert H(f)\vert^2}\Bigg)$     (3)
$\displaystyle \Rightarrow y=x+h+\ln(1+e^{n-x-h}) \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ $     (4)

where $ x$, $ n$, $ h$ and $ y$ represent log-spectral energies of clean signal, additive noise, convolutive noise and corrupted signal respectively.

Thus, the relationship between speech and noise is non-linear one, as given in Eq.(4). Experiments show that even if noise and clean speech parameters have Gaussian distribution (in log-domain), the corrupted speech parameters do not have Gaussian distribution anymore. However, if parameters have low variances, and in case a number of mixtures of Gaussians are used to model their distributions, the distribution of parameters can be still assumed to be Gaussian without much loss of accuracy; and the same decoder optimized for Gaussian distribution can be used.


next up previous
次へ: Polynomial-approximation 上へ: Model Composition by Lagrange 戻る: Introduction
平成16年9月23日