Experiments were carried out to validate our algorithm by evaluating the accuracy of $ F_0$ detection in comparison with well-known cepstrum. A database of every speech file and reference $ F_0$ contour are constructed from the ATR Speech Database. All signals were digitized at $ 12$ kHz sampling rate and analyzed with Hamming window where frame length and shift were 64 ms and 10 ms, respectively. The initial number of the tied-GMMs was set to $ 4$ and the frequency range was from $ 70$ Hz to $ 140$ Hz, and $ \sigma$ was assigned to $ 0.45$. Speech files begin with `myi-' and `fym-' stand for speech signals of a male and a female speakers. Deviations over $ 5 \%$ from the references were deemed as gross errors. Every accuracy shown in table 1, 2 and 3 is a percentage of frames at which $ F_0$s are correctly detected.