Experiments were carried out to validate our algorithm by evaluating the accuracy of detection in comparison with well-known cepstrum. A database of every speech file and reference contour are constructed from the ATR Speech Database. All signals were digitized at kHz sampling rate and analyzed with Hamming window where frame leng-th and shift were 64 ms and 10 ms, respectively. The initial number of the harmonic-GMMs was set to and the frequency range was from Hz to Hz, and was assigned to . Speech files begin with `myi-' and `fym-' stand for speech signals of a male and a female speakers. Deviations over from the references were deemed as gross errors. Every accuracy shown in table 1, 2 and 3 is a percentage of frames at which s are correctly detected.
Speech files | Accuracy(%) | ||
File 1 | File 2 | Speaker 1 | Speaker 2 |
`myisda01' | `myisda03' | 63.7 | 63.1 |
`myisda01' | `myisda04' | 45.7 | 51.6 |
`myisda02' | `myisda03' | 63.3 | 50.1 |
`myisda02' | `myisda04' | 59.4 | 42.1 |
`fymsda01' | `fymsda02' | 57.7 | 54.0 |
`fymsda01' | `fymsda04' | 53.1 | 41.0 |
`fymsda02' | `fymsda03' | 52.9 | 59.6 |
`fymsda02' | `fymsda04' | 64.9 | 64.7 |
`myisda01' | `fymsda03' | 45.7 | 43.0 |
`myisda02' | `fymsda05' | 55.0 | 44.5 |
`myisda03' | `fymsda04' | 41.4 | 59.9 |
`myisda04' | `fymsda02' | 64.9 | 50.6 |
`myisda05' | `fymsda03' | 59.4 | 62.8 |
`myisda04' | `fymsda01' | 62.0 | 71.7 |
Speech files | Accuracy(%) | ||
File 1 | File 2 | Speaker 1 | Speaker 2 |
`myisda01' | `myisda03' | 90.1 | 83.0 |
`myisda01' | `myisda04' | 92.8 | 81.3 |
`myisda02' | `myisda03' | 88.2 | 85.7 |
`myisda02' | `myisda04' | 84.4 | 87.6 |
`fymsda01' | `fymsda02' | 90.7 | 84.3 |
`fymsda01' | `fymsda04' | 85.3 | 82.6 |
`fymsda02' | `fymsda03' | 79.2 | 90.3 |
`fymsda02' | `fymsda04' | 86.2 | 92.6 |
`myisda01' | `fymsda03' | 76.1 | 84.9 |
`myisda02' | `fymsda05' | 74.8 | 92.8 |
`myisda03' | `fymsda04' | 72.6 | 88.4 |
`myisda04' | `fymsda02' | 86.3 | 85.5 |
`myisda05' | `fymsda03' | 78.0 | 86.6 |
`myisda04' | `fymsda01' | 79.0 | 86.6 |