 
 
 
 
 
   
It is known that prosodic information offers many useful clues for
speech recognition, such as location of important words and phrases,
topic segment boundaries, location of disfluencies, identification of
languages and others. 
The process of extracting prosodic information is generally conducted on
the assumption that  pattern is already (roughly) extracted.
Yet
 pattern is already (roughly) extracted.
Yet  patterns can not always be extracted simply in spontaneous
dialogue speech in which simultaneous utterances by two or more speakers
often occur. 
Thus, in order to incorporate proper prosodic information
into spontaneous dialogue speech recognition, 
a number of simultaneous
speakers and respective
 patterns can not always be extracted simply in spontaneous
dialogue speech in which simultaneous utterances by two or more speakers
often occur. 
Thus, in order to incorporate proper prosodic information
into spontaneous dialogue speech recognition, 
a number of simultaneous
speakers and respective  patterns are desired to
be extracted precisely.
However, the multi-pitch detection problem is hardly simple and is
difficult to be solved analytically.
 patterns are desired to
be extracted precisely.
However, the multi-pitch detection problem is hardly simple and is
difficult to be solved analytically.
Until now, numerous multi-pitch detection methods have been reported not only in
speech signal processing [1,2] but
also in musical signal processing[3,4,5] and auditory scene
analysis [6,7]. 
Chazan et al. addressed a speech separation method by introducing a time
warped signal model which allows a
continuous pitch variations within a long analysis frame [1]. 
Wu et al. described a multi-pitch tracking method in noisy environment
by filter bank process and pitch tracking using HMM [2].
Although these methods actualize an accurate detection of  s, either
of them does not include specific process of determining the 
number of speakers.
s, either
of them does not include specific process of determining the 
number of speakers. 
Our objective is to develop a multi-pitch detection algorithm which
 enables to detect the number of simultaneous speakers, the accurate  s as a
 continuous values, and moreover, respective spectral
 envelopes with spectral domain procedure. 
The basic approach is stated in Section 2, and 
the detection algorithm is described in Section
3. And the results of operation experiments are 
reported in Section 4.
s as a
 continuous values, and moreover, respective spectral
 envelopes with spectral domain procedure. 
The basic approach is stated in Section 2, and 
the detection algorithm is described in Section
3. And the results of operation experiments are 
reported in Section 4.
 
 
 
 
