次へ: Polynomial-approximation 上へ: Model Composition by Lagrange 戻る: Introduction

Model of the environment

An acoustical model demonstrating the effect of additive noise and channel filtering over a clean speech signal is shown in Figure 1.

The corrupted speech is given by:

$\displaystyle y[m]=x[m]*h[m]+n[m]$

(1)

where

is sample number. In power spectral domain, the filter-bank energies is given as:

$\displaystyle \vert Y(f)\vert^2\approx \vert X(f)\vert^2\vert H(f)\vert^2+\vert N(f)\vert^2$			(2)
$\displaystyle \Rightarrow \ln\vert Y(f)\vert^2\approx \ln\vert X(f)\vert^2+\ln\vert H(f)\vert^2\ \ \ \ \ {}$
$\displaystyle {}+\ln\Bigg(1+e^{\ln\vert N(f)\vert^2-\ln\vert X(f)\vert^2-\ln\vert H(f)\vert^2}\Bigg)$			(3)
$\displaystyle \Rightarrow y=x+h+\ln(1+e^{n-x-h}) \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \$			(4)

where

and

represent log-spectral energies of clean signal, additive noise, convolutive noise and corrupted signal respectively.

Thus, the relationship between speech and noise is non-linear one, as given in Eq.(4). Experiments show that even if noise and clean speech parameters have Gaussian distribution (in log-domain), the corrupted speech parameters do not have Gaussian distribution anymore. However, if parameters have low variances, and in case a number of mixtures of Gaussians are used to model their distributions, the distribution of parameters can be still assumed to be Gaussian without much loss of accuracy; and the same decoder optimized for Gaussian distribution can be used.

次へ: Polynomial-approximation 上へ: Model Composition by Lagrange 戻る: Introduction

平成16年9月23日