Continuous speech recognition and understanding is a difficult task and it is indispensable to use phrase boundary information for raising the recognition accuracy. But extraction of phrase boundaries in continuous speech by a pre-processor has not yet been developed, and thus speech recognition is very costly in terms of CPU time and memory. Therefore, the estimation of boundary positions either directly from the input speech or from extracted prosodic parameters is a very important problem.

Accent Phrase Segmentation by F0 Clustering Using Superpositional Modeling

System Outline (gif)

  • Training
  • Hand labeld F0 accent phrase patterns are parameterized according to the superpositional model (Fujisaki,SP84-36) and from these parameterized patterns a set of templates is constructed by the LBG clustering operation.

  • Segmentation
  • Automatic segmentation is performed by One-Stage DP between the reference templates and input F0 contour

    Continuous Speech Recognition Using Prosodic Information

    Frame Synchronous System (gif)


    mit@jaist.ac.jp