Shigeki Sagayama, PhD, Professor, University of Tokyo

Major Contributions
Selected technical contributions are listed below with first appearances in Japanese national conferences. Best viewed with Japanese font settings. If necessary corrections are found, please let me know at . Speech Analysis and Features Lag Window (1975) Proposed lag windowing of autocorrelation to reduce the pitch effect in PARCOR speech analysis/synthesis. Jointly patented with Tohkura and Hashimoto. Commonly used in LPC-based speech coding standards. Bibliography: 特許01358638 ``音声分析装置'' (Japan Patent No. 01358638). Cepstrum Distance for Speech Recognition (1978) Used LPC-cepstrum distance for DP-based speech recognition. Collaborated with H. Nagashima. Cepstrum is widely used in speech recognition. (Perhaps, the first use of cepstral features in speech recognition though application to speaker recognition was existing.) Bibliography: 好田正紀, 長島広海, 嵯峨山茂樹, ``音韻単位の標準パターンを用いた単語音声認識装置,'' 電子通信学会全国大会予稿集, S9-4, Vol. 5, pp. 335-336, 1978. Delta Cepstrum (1979) Proposed delta cepstrum for capturing the dynamic characteristics of speech. Primarily for speaker recognition. This feature was later successfully applied to DP-based speaker-independent speech recognition by S. Furui. With a name of ``delta-cepstrum'', extensively used in modern HMM-based speech recognition. Bibliography: 嵯峨山茂樹, 板倉文忠, ``音声の動的尺度に含まれる個人性情報,'' 日本音響学会昭和54年度春季研究発表会講演論文集, 3-2-7, pp. 589-590 (1979-06). (Shigei Sagayama and Fumitada Itakura, ``On Individuality in a Dynamic Measure of Speech,'' Proc. ASJ Spring Spring Conf. 1979, 3-2-7, pp. 589-590, June 1979.) LSP Analysis Theory (1979-) Collaborating with Itakura, produced a number of research results about theoretical properties of LSP, e.g., Duality theory of LPC and LSP, Theoretical properties of LSP frequency distributions, etc. Ph.D. thesis. Speech Analysis and Synthesis Composite Sinusoidal Modeling (CSM) (1979) Proposed modeling speech by a sum of $n$ sinusoids and equate autocorrelations of the speech signal and the model at lowest $2n-1$ points. The model frequencies were proven equivalent to Line Spectrum Pair frequencies. Applied to Yamaha's best selling sound IC, the CSM Speech Synthesis patent earned best among all NTT patents for several years. Bibliography: 嵯峨山茂樹, 板倉文忠, ``複合正弦波モデルによる音声分析,'' 電子情報通信学会情報・システム部門別全国大会予稿集, 63, p. 63, 1979.; 嵯峨山茂樹, 板倉文忠, ``複合正弦波による簡易な音声合成法,'' 日本音響学会昭和54年度秋季研究発表会講演論文集, 3-2-3, pp. 557-558, Oct. 1979. LSP Speech Synthesizer LSI (1980) Designed LSP Speech Synthesizer LSI collaborating with the Fujitsu's telephone switching hardware team. It became the first LSI for LSP speech synthesis and also the first C-MOS LSI for speech synthesis. Bibliography: 嵯峨山茂樹, 管村昇, 板倉文忠, 小池恒彦, ``線スペクトル対パラメータによる音声合成器LSIとその応用,'' 電子通信学会通信部門全国大会予稿集, S5-5, pp. 2-393-394, 1980. Japanese Text-to-Speech System (1982) Collaborated with H. Sato, Y. Sagisaka, and K. Kogure to construct the first Japanese text-to-speech system. Bibliography: 佐藤大和, 匂坂芳典, 小暮潔, 嵯峨山茂樹, ``日本語テキストからの音声合成,'' 電子情報通信学会全国大会予稿集, S6-3, Vol. 5, pp. 399-400, 1982. Phone Modeling Tree-based Allophone Clustering (1987) Proposed tree-based clustering technique for context-dependent phones. After presented in English in 1989 (IASSP89, Glasgow), this idea was adopted and modified by Kai-Fu Lee et al. in 1990, and extensively used by IBM and Cambridge University. Commonly used in modern phoneme-based high-performance speech recognition. Bibliography: 嵯峨山茂樹, ``音素環境のクラスタリング,'' 音学講論, 1-5-15, pp. 29-30, Oct. 1987. (Sagayama, S. (1987). "Phoneme Environment Clustering," 1-5-15, Proc. ASJ Fall Conference, pp. 29-30, Oct. 1987. Hidden Markov Network (1991) Combined the above-stated tree-based clustering and state tying ideas to represent context-dependent phones. Successive State Splitting (SSS) algorithm was also proposed for automatically obtaining a network structure of allophones. HMnet and SSS algorithm were extensively applied to speech recognition and other areas (e.g., protein analysis, automatic grammar acquisition, etc.). Bibliography: 鷹見淳一, 嵯峨山茂樹, ``逐次状態分割法(SSS)による隠れマルコフネットワークの自動生成,'' 日本音響学会平成3年度秋季研究発表会講演論文集, 2-5-13, pp. 73-74, Oct. 1991. (Jun-ichi Takami and Shigeki Sagayama, ``Automatic Generation of the Hidden Markov Network by the Successive State Splitting Algorithm,'' Proc. ASJ Fall Conf., 2-5-13, pp. 73-74, Oct. 1991.) Context-Dependent HMM-LR Continuous Speech Recognition (1996) Context-dependent HMM (HMnet) was combined with generalized LR parser for continuous speech recognition using a given context-free grammar (CFG). Bibliography: 永井明人, 鷹見淳一, 嵯峨山茂樹, ``環境依存連続 HMMを用いたHMM-LR連続音声認識,'' 日本音響学会平成3年度秋季研究発表会講演論文集, 1-5-20, pp. 39-40, Oct. 1991. Four-layer Tied-structure HMM (1996) Tying at four levels: allophones, states, Gaussians, and scalar parameters. Advantageous in training with small amount and fast likelihood calculation. Presented in English at ICASSP95. Bibliography: 高橋敏, 嵯峨山茂樹, ``4 階層の共有構造を持つ音素環境依存 HMM の検討,'' 日本音響学会平成6年度秋期研究発表会講演論文集, 3-8-3, pp. 113-114, Oct. 1994. (Satoshi Takahashi, Shigeki Sagayama, ``A Study of Context-Dependent HMMs with Four-Level Tied Structure,'' Proc. ASJ Fall Conf., 3-8-3, pp. 113-114, Oct. 1994.) Discrete Mixture HMM (1996) Mixture components are replaced by discrete (scalar quantized) distributions to represent non-Gaussian, complex distributions. Presented in English at ICASSP97. Bibliography: 高橋敏, 嵯峨山茂樹, ``離散混合出力分布型HMM,'' 日本音響学会平成8年度秋期研究発表会講演論文集, -, pp. -, Sep. 1996. (Satoshi Takahashi, Shigeki Sagayama, ``Discrete Mixture Output Distribution HMM,'' Proc. ASJ Fall Conf., -, pp. -, Sep. 1996.) Asynchronous-Transition HMM (1999) Proposed a new HMM structure where state transitions are not synchronous between features. Bibliography: 松田繁樹, 中井満, 下平博, 嵯峨山茂樹, ``非同期遷移型HMM,'' 平成11年日本音響学会秋季研究発表会講演論文集, 1-1-12, pp. 23-24, Oct. 1999. Multiple Linear-Regression HMM (2000) Proposed a new HMM structure where mean vectors are linearly dependent on observable factors, such as pitch frequency and power. English paper at ICASSP2001. Bibliography: 藤永勝久, 中井満, 下平博, 嵯峨山茂樹, ``重回帰 HMMによる音声のモデル化,'' 平成12年電気関係学会北陸支部大会講演論文集, G-15, p. 458, Sep 2000. Speaker & Noise Modeling & Adaptation Vector Field Smoothing (1992) Proposed a speaker adaptation method by spatially smoothing difference vectors between original and trained Gaussian mean vectors in the feature space. This was the first method enabling adapting the Gaussian mixture HMM to speaker. Extensively used in Japan. Bibliography: 大倉計美, 杉山雅英, 嵯峨山茂樹, ``混合連続分布HMM を用いた移動ベクトル場平滑化話者適応方式,'' 日本音響学会平成4年度春季研究発表会講演論文集, 2-Q-17, pp. 191-192, Mar. 1992. (Kazumi Ohkura, Masahide Sugiyama and Sigeki Sagayama, ``Speaker Adaptation Based on Transfer Vector Field Smoothing Model with Continuous Mixture Density HMMs,'' ASJ, 2-Q-17, pp. 191-192, (Mar. 1992). Speaker-Tied Mixture (1992) Gaussian mixture is derived from speaker-dependent single Gaussian phone (allophone) models. Later, this model was used for rapid speaker adaptation where speaker mixture weights are adapted using an extremely small amount of training data (1 word, for example). Bibliography: 小坂哲夫, 鷹見淳一, 嵯峨山茂樹, ``話者混合SSSによる不特定話者音声認識,'' 日本音響学会平成4年度秋季研究発表会講演論文集, 2-5-9, pp. 135-136, Oct. 1992. (T. Kosaka, J. Takami and S. Sagayama, ``Speaker-Independent Speech Recognition Using Speaker-Mixture SSS algorithm,'' ASJ Fall Conf., 2-5-9, pp. 135-136, Oct. 1992. Speaker Tree (1993) Applied tree-based clustering to speakers to find a speaker tree that spanned from a speaker-independent model to speaker-dependent models along the tree. Bibliography: 小坂哲夫, 松永昭一, 嵯峨山茂樹, ``木構造クラスタリングを用いた話者適応,'' 日本音響学会平成5年度秋期研究発表会講演論文集, 2-7-14, pp. 97-98, Oct. 1993. MAP/VFS for Speaker Adaptation (1994) MAP (maximum a priori) training and VFS (vector field smoothing) are combined to accelerate speaker adaptation. Bibliography: 高橋淳一, 嵯峨山茂樹, ``最大事後確率推定と移動ベクトル場平滑化を組み合わせによる高速話者適応,'' 日本音響学会平成6年度秋期研究発表会講演論文集, 2-8-19, pp. 75-76, Oct. 1994. (Jun-ichi Takahashi, Shigeki Sagayama, ``Vector-Field-Smoothed Bayesian Learning for Fast Speaker Adaptation,'' Proc. ASJ Fall Conf., 2-8-19, pp.75-76, Oct 1994.) Jacobian Adaptation (1996) Proposed a fast model adaptation method to the environmental noise. When adapting a model trained beforehand for noise A to the target noise B, and when A and B are relatively close, the noise adaptation procedure is linearized and can be very fast. This idea was first formulated at ATR, 1992, inspired by PMC by M. Gales. First experimental results were obtained in 1996 and presented in English at ICASSP97 (Munich). Extensive studies often seen in recent ICASSPs and ICSLPs. Bibliography: 山口義和, 高橋淳一, 高橋敏, 嵯峨山茂樹, ``Taylor展開に基づく高速な音響モデル適応法,'' 日本音響学会平成8年度秋季研究発表会講演論文集, -, pp. -, Sep. 1996. Hand-Writing Recognition HMM-based Online Hand-Written Kanji-Character Recognition with Structured Lexicon (2000) Online hand-written Kanji character is recognized in the continuous speech recognition framework where a 6500-Kanji lexicon is hierarchically structured and represented by sequences of substrokes. This was the first application of continuous speech recognition algorithm to handwriting. English papers at ICDAR2001, IWFHR2002, and ICPR2002. Bibliography: 秋良直人, 中井満, 下平博, 嵯峨山茂樹, ``ストロークHMMによるオンライン文字認識の特徴量の検討,'' 平成12年電気関係学会北陸支部大会講演論文集, F-92, p. 393, Sep 2000. Music Information Processing HMM-based Music Transcription from MIDI Signals (1999) The sequence of observed note durations (inter-onset time) was transcribed by an HMM-based note recognizer with a grammar probabilistically modeling the sequence of musical notes. Presented in English at IEEE MMSP2002. Bibilography: 齋藤直樹, 中井満, 下平博, 嵯峨山茂樹, ``隠れマルコフモデルによる音楽演奏情報からの音符列推定,'' 平成11年電気関係学会北陸支部大会講演論文集, Oct. 1999. HMM-based Harmonization of Given Melodies (1999) Optimal chord finding was formulated as finding the Viterbi sequence of hidden state which generates the observed melody. With a bigram grammar of chord sequences, the decoding process estimates the most likely chord sequence in the maximum likelihood sense. Bibliography: 川上隆, 中井満, 下平博, 嵯峨山茂樹, ``隠れマルコフモデルを用いた旋律への和声付け,'' 平成11年電気関係学会北陸支部大会講演論文集, Oct. 1999. Projects Anthropomorphic Dialogue Agent Toolkit (2000-2002) Involving 17 researchers from 10 organizations, a project for providing a open-source, free-of-charge toolkit is in progress for promoting the spoken dialog research. Funded by IPA (Information Processing Promotion Agency) for 3 years. Written Communication for the Blind (2000-2002) Combining hand-written character recognition, speech synthesis, and new transducers for written input (replacing the stylus), a project is in progress for supporting written communication (E-mailing etc.) of the blind. Collaborating with 7 organizations. Funded by Ishikawa Prefecture for 3 years.

Major Contributions

Selected technical contributions are listed below with first appearances in Japanese national conferences. Best viewed with Japanese font settings. If necessary corrections are found, please let me know at

Speech Analysis and Features

Lag Window (1975)
Proposed lag windowing of autocorrelation to reduce the pitch effect in PARCOR speech analysis/synthesis. Jointly patented with Tohkura and Hashimoto. Commonly used in LPC-based speech coding standards.
Bibliography: 特許01358638 ``音声分析装置'' (Japan Patent No. 01358638).
Cepstrum Distance for Speech Recognition (1978)
Used LPC-cepstrum distance for DP-based speech recognition. Collaborated with H. Nagashima. Cepstrum is widely used in speech recognition. (Perhaps, the first use of cepstral features in speech recognition though application to speaker recognition was existing.)
Bibliography: 好田正紀, 長島広海, 嵯峨山茂樹, ``音韻単位の標準パターンを用いた単語音声認識装置,'' 電子通信学会全国大会予稿集, S9-4, Vol. 5, pp. 335-336, 1978.
Delta Cepstrum (1979)
Proposed delta cepstrum for capturing the dynamic characteristics of speech. Primarily for speaker recognition. This feature was later successfully applied to DP-based speaker-independent speech recognition by S. Furui. With a name of ``delta-cepstrum'', extensively used in modern HMM-based speech recognition.
Bibliography: 嵯峨山茂樹, 板倉文忠, ``音声の動的尺度に含まれる個人性情報,'' 日本音響学会昭和54年度春季研究発表会講演論文集, 3-2-7, pp. 589-590 (1979-06). (Shigei Sagayama and Fumitada Itakura, ``On Individuality in a Dynamic Measure of Speech,'' Proc. ASJ Spring Spring Conf. 1979, 3-2-7, pp. 589-590, June 1979.)
LSP Analysis Theory (1979-)
Collaborating with Itakura, produced a number of research results about theoretical properties of LSP, e.g., Duality theory of LPC and LSP, Theoretical properties of LSP frequency distributions, etc. Ph.D. thesis.

Speech Analysis and Synthesis

Composite Sinusoidal Modeling (CSM) (1979)
Proposed modeling speech by a sum of $n$ sinusoids and equate autocorrelations of the speech signal and the model at lowest $2n-1$ points. The model frequencies were proven equivalent to Line Spectrum Pair frequencies. Applied to Yamaha's best selling sound IC, the CSM Speech Synthesis patent earned best among all NTT patents for several years.
Bibliography: 嵯峨山茂樹, 板倉文忠, ``複合正弦波モデルによる音声分析,'' 電子情報通信学会情報・システム部門別全国大会予稿集, 63, p. 63, 1979.; 嵯峨山茂樹, 板倉文忠, ``複合正弦波による簡易な音声合成法,'' 日本音響学会昭和54年度秋季研究発表会講演論文集, 3-2-3, pp. 557-558, Oct. 1979.
LSP Speech Synthesizer LSI (1980)
Designed LSP Speech Synthesizer LSI collaborating with the Fujitsu's telephone switching hardware team. It became the first LSI for LSP speech synthesis and also the first C-MOS LSI for speech synthesis.
Bibliography: 嵯峨山茂樹, 管村昇, 板倉文忠, 小池恒彦, ``線スペクトル対パラメータによる音声合成器LSIとその応用,'' 電子通信学会通信部門全国大会予稿集, S5-5, pp. 2-393-394, 1980.
Japanese Text-to-Speech System (1982)
Collaborated with H. Sato, Y. Sagisaka, and K. Kogure to construct the first Japanese text-to-speech system.
Bibliography: 佐藤大和, 匂坂芳典, 小暮潔, 嵯峨山茂樹, ``日本語テキストからの音声合成,'' 電子情報通信学会全国大会予稿集, S6-3, Vol. 5, pp. 399-400, 1982.

Phone Modeling

Tree-based Allophone Clustering (1987)
Proposed tree-based clustering technique for context-dependent phones. After presented in English in 1989 (IASSP89, Glasgow), this idea was adopted and modified by Kai-Fu Lee et al. in 1990, and extensively used by IBM and Cambridge University. Commonly used in modern phoneme-based high-performance speech recognition.
Bibliography: 嵯峨山茂樹, ``音素環境のクラスタリング,'' 音学講論, 1-5-15, pp. 29-30, Oct. 1987. (Sagayama, S. (1987). "Phoneme Environment Clustering," 1-5-15, Proc. ASJ Fall Conference, pp. 29-30, Oct. 1987.
Hidden Markov Network (1991)
Combined the above-stated tree-based clustering and state tying ideas to represent context-dependent phones. Successive State Splitting (SSS) algorithm was also proposed for automatically obtaining a network structure of allophones. HMnet and SSS algorithm were extensively applied to speech recognition and other areas (e.g., protein analysis, automatic grammar acquisition, etc.).
Bibliography: 鷹見淳一, 嵯峨山茂樹, ``逐次状態分割法(SSS)による隠れマルコフネットワークの自動生成,'' 日本音響学会平成3年度秋季研究発表会講演論文集, 2-5-13, pp. 73-74, Oct. 1991. (Jun-ichi Takami and Shigeki Sagayama, ``Automatic Generation of the Hidden Markov Network by the Successive State Splitting Algorithm,'' Proc. ASJ Fall Conf., 2-5-13, pp. 73-74, Oct. 1991.)
Context-Dependent HMM-LR Continuous Speech Recognition (1996)
Context-dependent HMM (HMnet) was combined with generalized LR parser for continuous speech recognition using a given context-free grammar (CFG).
Bibliography: 永井明人, 鷹見淳一, 嵯峨山茂樹, ``環境依存連続 HMMを用いたHMM-LR連続音声認識,'' 日本音響学会平成3年度秋季研究発表会講演論文集, 1-5-20, pp. 39-40, Oct. 1991.
Four-layer Tied-structure HMM (1996)
Tying at four levels: allophones, states, Gaussians, and scalar parameters. Advantageous in training with small amount and fast likelihood calculation. Presented in English at ICASSP95.
Bibliography: 高橋敏, 嵯峨山茂樹, ``4 階層の共有構造を持つ音素環境依存 HMM の検討,'' 日本音響学会平成6年度秋期研究発表会講演論文集, 3-8-3, pp. 113-114, Oct. 1994. (Satoshi Takahashi, Shigeki Sagayama, ``A Study of Context-Dependent HMMs with Four-Level Tied Structure,'' Proc. ASJ Fall Conf., 3-8-3, pp. 113-114, Oct. 1994.)
Discrete Mixture HMM (1996)
Mixture components are replaced by discrete (scalar quantized) distributions to represent non-Gaussian, complex distributions. Presented in English at ICASSP97.
Bibliography: 高橋敏, 嵯峨山茂樹, ``離散混合出力分布型HMM,'' 日本音響学会平成8年度秋期研究発表会講演論文集, -, pp. -, Sep. 1996. (Satoshi Takahashi, Shigeki Sagayama, ``Discrete Mixture Output Distribution HMM,'' Proc. ASJ Fall Conf., -, pp. -, Sep. 1996.)
Asynchronous-Transition HMM (1999)
Proposed a new HMM structure where state transitions are not synchronous between features.
Bibliography: 松田繁樹, 中井満, 下平博, 嵯峨山茂樹, ``非同期遷移型HMM,'' 平成11年日本音響学会秋季研究発表会講演論文集, 1-1-12, pp. 23-24, Oct. 1999.
Multiple Linear-Regression HMM (2000)
Proposed a new HMM structure where mean vectors are linearly dependent on observable factors, such as pitch frequency and power. English paper at ICASSP2001.
Bibliography: 藤永勝久, 中井満, 下平博, 嵯峨山茂樹, ``重回帰 HMMによる音声のモデル化,'' 平成12年電気関係学会北陸支部大会講演論文集, G-15, p. 458, Sep 2000.

Speaker & Noise Modeling & Adaptation

Vector Field Smoothing (1992)
Proposed a speaker adaptation method by spatially smoothing difference vectors between original and trained Gaussian mean vectors in the feature space. This was the first method enabling adapting the Gaussian mixture HMM to speaker. Extensively used in Japan.
Bibliography: 大倉計美, 杉山雅英, 嵯峨山茂樹, ``混合連続分布HMM を用いた移動ベクトル場平滑化話者適応方式,'' 日本音響学会平成4年度春季研究発表会講演論文集, 2-Q-17, pp. 191-192, Mar. 1992. (Kazumi Ohkura, Masahide Sugiyama and Sigeki Sagayama, ``Speaker Adaptation Based on Transfer Vector Field Smoothing Model with Continuous Mixture Density HMMs,'' ASJ, 2-Q-17, pp. 191-192, (Mar. 1992).
Speaker-Tied Mixture (1992)
Gaussian mixture is derived from speaker-dependent single Gaussian phone (allophone) models. Later, this model was used for rapid speaker adaptation where speaker mixture weights are adapted using an extremely small amount of training data (1 word, for example).
Bibliography: 小坂哲夫, 鷹見淳一, 嵯峨山茂樹, ``話者混合SSSによる不特定話者音声認識,'' 日本音響学会平成4年度秋季研究発表会講演論文集, 2-5-9, pp. 135-136, Oct. 1992. (T. Kosaka, J. Takami and S. Sagayama, ``Speaker-Independent Speech Recognition Using Speaker-Mixture SSS algorithm,'' ASJ Fall Conf., 2-5-9, pp. 135-136, Oct. 1992.
Speaker Tree (1993)
Applied tree-based clustering to speakers to find a speaker tree that spanned from a speaker-independent model to speaker-dependent models along the tree.
Bibliography: 小坂哲夫, 松永昭一, 嵯峨山茂樹, ``木構造クラスタリングを用いた話者適応,'' 日本音響学会平成5年度秋期研究発表会講演論文集, 2-7-14, pp. 97-98, Oct. 1993.
MAP/VFS for Speaker Adaptation (1994)
MAP (maximum a priori) training and VFS (vector field smoothing) are combined to accelerate speaker adaptation.
Bibliography: 高橋淳一, 嵯峨山茂樹, ``最大事後確率推定と移動ベクトル場平滑化を組み合わせによる高速話者適応,'' 日本音響学会平成6年度秋期研究発表会講演論文集, 2-8-19, pp. 75-76, Oct. 1994. (Jun-ichi Takahashi, Shigeki Sagayama, ``Vector-Field-Smoothed Bayesian Learning for Fast Speaker Adaptation,'' Proc. ASJ Fall Conf., 2-8-19, pp.75-76, Oct 1994.)
Jacobian Adaptation (1996)
Proposed a fast model adaptation method to the environmental noise. When adapting a model trained beforehand for noise A to the target noise B, and when A and B are relatively close, the noise adaptation procedure is linearized and can be very fast. This idea was first formulated at ATR, 1992, inspired by PMC by M. Gales. First experimental results were obtained in 1996 and presented in English at ICASSP97 (Munich). Extensive studies often seen in recent ICASSPs and ICSLPs.
Bibliography: 山口義和, 高橋淳一, 高橋敏, 嵯峨山茂樹, ``Taylor展開に基づく高速な音響モデル適応法,'' 日本音響学会平成8年度秋季研究発表会講演論文集, -, pp. -, Sep. 1996.

Hand-Writing Recognition

HMM-based Online Hand-Written Kanji-Character Recognition with Structured Lexicon (2000)
Online hand-written Kanji character is recognized in the continuous speech recognition framework where a 6500-Kanji lexicon is hierarchically structured and represented by sequences of substrokes. This was the first application of continuous speech recognition algorithm to handwriting. English papers at ICDAR2001, IWFHR2002, and ICPR2002.
Bibliography: 秋良直人, 中井満, 下平博, 嵯峨山茂樹, ``ストロークHMMによるオンライン文字認識の特徴量の検討,'' 平成12年電気関係学会北陸支部大会講演論文集, F-92, p. 393, Sep 2000.

Music Information Processing

HMM-based Music Transcription from MIDI Signals (1999)
The sequence of observed note durations (inter-onset time) was transcribed by an HMM-based note recognizer with a grammar probabilistically modeling the sequence of musical notes. Presented in English at IEEE MMSP2002.
Bibilography: 齋藤直樹, 中井満, 下平博, 嵯峨山茂樹, ``隠れマルコフモデルによる音楽演奏情報からの音符列推定,'' 平成11年電気関係学会北陸支部大会講演論文集, Oct. 1999.
HMM-based Harmonization of Given Melodies (1999)
Optimal chord finding was formulated as finding the Viterbi sequence of hidden state which generates the observed melody. With a bigram grammar of chord sequences, the decoding process estimates the most likely chord sequence in the maximum likelihood sense.
Bibliography: 川上隆, 中井満, 下平博, 嵯峨山茂樹, ``隠れマルコフモデルを用いた旋律への和声付け,'' 平成11年電気関係学会北陸支部大会講演論文集, Oct. 1999.

Projects

Anthropomorphic Dialogue Agent Toolkit (2000-2002)
Involving 17 researchers from 10 organizations, a project for providing a open-source, free-of-charge toolkit is in progress for promoting the spoken dialog research. Funded by IPA (Information Processing Promotion Agency) for 3 years.
Written Communication for the Blind (2000-2002)
Combining hand-written character recognition, speech synthesis, and new transducers for written input (replacing the stylus), a project is in progress for supporting written communication (E-mailing etc.) of the blind. Collaborating with 7 organizations. Funded by Ishikawa Prefecture for 3 years.

Back to the top