Graduate School of Information Science and Technology

The University of Tokyo, Japan
`{kameoka,nishi,sagayama}@hil.t.u-tokyo.ac.jp`

In this paper, a co-channel multi-pitch detection algorithm is
described.
We suggest the importance of this when prosodic information
is need to be extracted separately from respective patterns of
concurrent utterances.
Though temporal continuity of speech prosody should be considered, we
discuss a process done independently on each single frame as
the first step.
A model of multiple harmonic structures is constructed with a mixture of
tied Gaussian mixtures with which a single harmonic structure is modeled.
Our algorithm enables to detect both a number of concurrent speakers,
and each spectral envelope of underlying harmonic structure based on a
maximum likelihood estimation
of the model parameters using EM algorithm and an information criterion.
It operates without a priori information of contours and a
restriction of a number of speakers, and it also extracts
accurate s as continuous values with simple
procedures in spectral domain. Experiments showed our algorithm
outperformed well-known cepstrum for both speech signals of a single
speaker and simultaneous two speakers.

- Introduction
- A Maximum Likelihood Formulation
- Model of Harmonic Structures
- Model Parameter Estimation using EM Algorithm
- Another Interpretation as Clustering

- Multi-pitch Detection Algorithm
- Criterion of Model Selection
- Detection of the number of speakers
- Detection of s and Spectral Envelopes

- Experiments

- Conclusions
