next up previous
次へ: Polynomial Approximation 上へ: raut04ASJ03 戻る: Introduction

Model of the Environment


An acoustical model demonstrating the effect of additive noise $n[m]$ and channel filtering $h[m]$ over a clean speech signal $x[m]$ is shown in Figure 1.

図 1: Model of the acoustical environment.
\includegraphics[width=.7\linewidth]{eps/envmodel.eps}

The corrupted speech is given by:

$\displaystyle y[m]=x[m]*h[m]+n[m]$     (1)

where $m$ is sample number. In power spectral domain:
$\displaystyle \vert Y(f)\vert^2\approx \vert X(f)\vert^2\vert H(f)\vert^2+\vert N(f)\vert^2$     (2)
$\displaystyle \Rightarrow \ln\vert Y(f)\vert^2\approx \ln\vert X(f)\vert^2+\ln\vert H(f)\vert^2+{}$      
$\displaystyle {}\ln\Bigg(1+e^{\ln\vert N(f)\vert^2-\ln\vert X(f)\vert^2-\ln\vert H(f)\vert^2}\Bigg)$     (3)
$\displaystyle \Rightarrow y=x+h+\ln(1+e^{n-x-h}) \ \ \ \ \ \ \ \ \ \ \ \ \ \ $     (4)

where $x$, $n$, $h$ and $y$ represent log-spectral energies of clean signal, additive noise, convolutive noise and corrupted signal respectively, for given frequency $f$.

Thus, the relationship between speech and noise is non-linear one, as given in Eq.(4). Experiments show that even if noise and clean speech parameters have Gaussian distribution (in log-domain), the corrupted speech parameters do not have Gaussian distribution anymore. However, if parameters have low variances, and in case a number of mixtures of Gaussians are used to model their distributions, the distribution of parameters can be still assumed to be Gaussian without much loss of accuracy and being able to use the same decoder optimized for Gaussian distribution.



next up previous
次へ: Polynomial Approximation 上へ: raut04ASJ03 戻る: Introduction
平成16年4月23日