Demo page for Interspeech 2014 paper

Demo page for Interspeech 2014 paper

"Speech prosody generation for text-to-speech synthesis based on generative model of F0 contours"

Contributors: Kento Kadowaki, Tatsuma Ishihara, Nobukatsu Hojo (The University of Tokyo) and Hirokazu Kameoka (The University of Tokyo, NTT Corporation)

  Our Concept
Original Fujisaki Model Fujisaki Model based probabilistic model for F0 contours Context Clustering
The Fujisaki model is a well-founded mathematical model, that describes the process by which the whole F0 contour of a speech utterance is generated. This model is known to approximate actual F0 contours of speech well when the parameters are chosen appropriately. It is interesting to note that the phrase and accent commands, which we will henceforth refer to as the Fujisaki-model parameters, can be interpreted as quantities related to linguistic information. In the Japanese language, a phrase command and an accent command typically occur at the beginning of each breath group and over the range of accent nucleus in each accentual phrase, respectively. Thus, the Fujisaki model can potentially be a good model for F0 contour synthesis. We have previously formulated a statistical model of speech F0 contours by translating the Fujisaki model into a probabilistic model described as a discrete-time stochastic process in [Kameoka2010SAPA09,Yoshizato2012SpeechProdody05]. This formulation has allowed us not only to derive an efficient parameter inference algorithm utilizing powerful statistical methods but also to obtain an automatically trainable version of the Fujisaki model. To associate a sequence of the Fujisaki model parameters with a text input based on statistical learning, we proposed extending this model to a context-dependent one. We further propose a parameter training algorithm for the present model based on a decision tree-based context clustering. The F0 contour generated by conventional method does not necessarily match the mechanical constraint. However, our method never generate such a F0 contours since our generative model is not only linguistically appropreate but also physically likely to be generated via the control phonation. Hence our method is able to synthesize F0 contours that is likely to originate from real voice. The shows the idea of generative model.The results demonstrated great potential to generate a natural-sounding F0 contour from a text input.
Click images to enlarge.

Keywords: Speech F0 contours, stochasric model, Fujisaki model, Hidden Marcov Model, EM algorithm

  Bibliography
  • [Kameoka2010SAPA09]
    Hirokazu Kameoka, Jonathan Le Roux, Yasunori Ohishi, "A statistical model of speech F0 contours," ISCA Tutorial and Research Workshop on Statistical And Perceptual Audition (SAPA 2010), pp. 43-48, Sep. 2010. [PDF]
  • [Yoshizato2012SpeechProdody05]
    Kota Yoshizato, Hirokazu Kameoka, Daisuke Saito, Shigeki Sagayama, "Hidden Markov convolutive mixture model for pitch contour analysis of speech," in Proc. The 13th Annual Conference of the International Speech Communication Association (Interspeech 2012), Mon.O2d.06, Sep. 2012. [PDF]

Demonstration of synthesized speech in japanese

The F0 contour generated by conventional method does not necessarily match the mechanical constraint. However, our method never generate such a F0 contours since our generative model is not only linguistically appropreate but also physically likely to be generated via the control phonation. Hence our method is able to synthesize F0 contours that is likely to originate from real voice.


Experimental conditions
Conventional method Proposed method Japanese utterance in each sentence
sentence01
sentence01
sentence01
"chiisana unagiya ni nekkino youna monoga minagiru"
sentence02
sentence02
sentence02
"dorobo demo haitta kato isshun boku wa omotta"
sentence03
sentence03
sentence03
"gakusei wa repo-to o okuto chotto atama o sagete deteitta"
sentence04
sentence04
sentence04
"kippu o kauno wa jidou hanbaiki kara dearu"
sentence05
sentence05
sentence05
"tokai dewa deauhito no hotonndo ga mishiranu hito dearu"
sentence06
sentence06
sentence06
"bunmei o sasaeru dodai ga kuzurete shimau"
sentence07
sentence07
sentence07
"hitobito ga jiyuu ni deiri dekiru"
sentence08
sentence08
sentence08
"dann dann jibunn ga osoroshiku natte ieni nigekaetta"
sentence09
sentence09
sentence09
"gozenn chuude owaru rennshuu o netto urakara kaima mita"
sentence10
sentence10
sentence10
"itsumono kyuujitsu no pata-nn o sugoshite higa kureru"
sentence11
sentence11
sentence11
"ongaku no sukina oreno tameni wazawaza rokku myu-jikku o youi shitekite atta"
sentence12
sentence12
sentence12
"fushigina kurai utsukushiku irodorareta michi o iku"
sentence13
sentence13
sentence13
"masshiro na haga shiroi te-pu to tomoni yureta"
sentence14
sentence14
sentence14
"nyuugaku shikenn o ukeru tokiyori hisshino omoi dearu"
sentence15
sentence15
sentence15
"kochiramo kyatto wameite tobiagatta"
sentence16
sentence16
sentence16
"kondo wa fuguno kisetsu ni itte mitai"
sentence17
sentence17
sentence17
"musume no fianse de koitsu dakeniwa doushitemo makerarenai"
sentence18
sentence18
sentence18
"watashi wa sore o ryokann ni motte kaetta"
sentence19
sentence19
sentence19
"sorewa gyakkyou kara nukedashitai toiu setsunai hodono gannbou darou ka"
sentence20
sentence20
sentence20
"mainichi byouinn made kayotta hahano aino fukasa tsuyosa dearu"