-
Continuous speech recognition and understanding is a difficult task
and it is indispensable to use phrase boundary information for raising
the recognition accuracy. But extraction of phrase boundaries in
continuous speech by a pre-processor has not yet been developed, and
thus speech recognition is very costly in terms of CPU time and
memory. Therefore, the estimation of boundary positions either
directly from the input speech or from extracted prosodic parameters
is a very important problem.
Accent Phrase Segmentation by F0 Clustering Using
Superpositional Modeling
System Outline (gif)
-
- Training
-
Hand labeld F0 accent phrase patterns are parameterized according to
the superpositional model (Fujisaki,SP84-36) and from these
parameterized patterns a set of templates is constructed by the LBG
clustering operation.
- Segmentation
-
Automatic segmentation is performed by One-Stage DP between the reference
templates and input F0 contour
Continuous Speech Recognition Using Prosodic Information
Frame Synchronous System (gif)