Shigeki Sagayama, PhD, Professor, University of Tokyo

主な研究成果
以下に、研究成果の主なものを示します。修正が必要な箇所があれば、宛てにご指摘下さい。音声分析と音声特徴量 Speech Analysis and Features ラグ窓法 (1975) Lag Window ラグ窓法はPARCOR分析においてピッチの影響を軽減するための手法として着想したものですが、そののちLSPも含め音声符号化などの標準化にも含まれ、広く用いられています。ほぼ世界中全ての携帯電話に用いられていると言ってよいでしょう。 [ Proposed lag windowing of autocorrelation to reduce the pitch effect in PARCOR speech analysis/synthesis. Jointly patented with Tohkura and Hashimoto. Commonly used in LPC-based speech coding standards. ] 初出: 特許01358638 ``音声分析装置'' (東倉、橋本、嵯峨山) (Japan Patent No. 01358638). ケプストラム距離 (1978) Cepstrum Distance for Speech Recognition ケプストラム(あるいはmel-cepstrum)は音声認識の特徴量として最も多く用いられていますが、ケプストラムを音声認識に最初に用いたのは誰であるか疑問に思っていました。おそらく我々(長島、好田、嵯峨山)であろう、と外国の方々も言われます。より正しい情報があれば、御教示下さい。 Used LPC-cepstrum distance for DP-based speech recognition. Collaborated with H. Nagashima. Cepstrum is widely used in speech recognition. (Perhaps, the first use of cepstral features in speech recognition though application to speaker recognition was existing.) 初出: 好田正紀, 長島広海, 嵯峨山茂樹, ``音韻単位の標準パターンを用いた単語音声認識装置,'' 電子通信学会全国大会予稿集, S9-4, Vol. 5, pp. 335-336, 1978. デルタケプストラム (1979) Delta Cepstrum 現在の音声認識では、音声特徴量としてほぼ例外なくデルタ特徴量(ケプルトラムに対してデルタケプルトラム)が使われますが、このような動的特徴量を最初に提唱しました。そのときの目的は話者特性の解析でした。後年、この特徴量を古井(当時は嵯峨山の指導者)が不特定話者音声認識に使用し大きな効果を実証しました。 [ Proposed delta cepstrum for capturing the dynamic characteristics of speech. Primarily for speaker recognition. This feature was later succesfully applied to DP-based speaker-independent speech recognition by S. Furui. With a name of ``delta-cepstrum'', extensively used in modern HMM-based speech recognition. ] 初出: 嵯峨山茂樹, 板倉文忠, ``音声の動的尺度に含まれる個人性情報,'' 日本音響学会昭和54年度春季研究発表会講演論文集, 3-2-7, pp. 589-590 (1979-06). (Shigei Sagayama and Fumitada Itakura, ``On Individuality in a Dynamic Measure of Speech,'' Proc. ASJ Spring Spring Conf. 1979, 3-2-7, pp. 589-590, June 1979.) 線スペクトル分析理論 (1979-) LSP Analysis Theory 線スペクトル分析は板倉が着想し、現在はほぼすべての携帯電話で使われている効率の高い音声情報圧縮方式ですが、その理論について板倉・嵯峨山が研究し、その理論的な性質を明らかにしました。 [ Collaborating with Itakura, produced a number of research results about theoretical properties of LSP, e.g., Duality theory of LPC and LSP, Theoretical properties of LSP frequency distributions, etc. Ph.D. thesis. ] 音声分析と音声合成 Speech Analysis and Synthesis 複合正弦波モデル化 (1979) Composite Sinusoidal Modeling (CSM) Proposed modeling speech by a sum of $n$ sinusoids and equate autocorrelations of the speech signal and the model at lowest $2n-1$ points. The model frequencies were proven equivalent to Line Spectrum Pair frequencies. Applied to Yamaha's best selling sound IC, the CSM Speech Synthesis patent earned best among all NTT patents for several years. 初出: 嵯峨山茂樹, 板倉文忠, ``複合正弦波モデルによる音声分析,'' 電子情報通信学会情報・システム部門別全国大会予稿集, 63, p. 63, 1979.; 嵯峨山茂樹, 板倉文忠, ``複合正弦波による簡易な音声合成法,'' 日本音響学会昭和54年度秋季研究発表会講演論文集, 3-2-3, pp. 557-558, Oct. 1979. LSP音声合成LSI (1980) LSP Speech Synthesizer LSI Designed LSP Speech Synthesizer LSI collaborating with the Fujitsu's telephone switching hardware team. It became the first LSI for LSP speech synthesis and also the first C-MOS LSI for speech synthesis. 初出: 嵯峨山茂樹, 管村昇, 板倉文忠, 小池恒彦, ``線スペクトル対パラメータによる音声合成器LSIとその応用,'' 電子通信学会通信部門全国大会予稿集, S5-5, pp. 2-393-394, 1980. 日本語テキスト音声合成 (1982) Japanese Text-to-Speech System Collaborated with H. Sato, Y. Sagisaka, and K. Kogure to construct the first Japanese text-to-speech system. 初出: 佐藤大和, 匂坂芳典, 小暮潔, 嵯峨山茂樹, ``日本語テキストからの音声合成,'' 電子情報通信学会全国大会予稿集, S6-3, Vol. 5, pp. 399-400, 1982. 音素モデル Phone Modeling 木構造の異音クラスタリング (1987) Tree-based Allophone Clustering Proposed tree-based clustering technique for context-dependent phones. After presented in English in 1989 (IASSP89, Glasgow), this idea was adopted and modified by Kai-Fu Lee et al. in 1990, and extensively used by IBM and Cambridge University. Commonly used in modern phoneme-based high-performance speech recognition. 初出: 嵯峨山茂樹, ``音素環境のクラスタリング,'' 音学講論, 1-5-15, pp. 29-30, Oct. 1987. (Sagayama, S. (1987). "Phoneme Environment Clustering," 1-5-15, Proc. ASJ Fall Conference, pp. 29-30, Oct. 1987. 隠れマルコフ網 (1991) Hidden Markov Network Combined the above-stated tree-based clustering and state tying ideas to represent context-dependent phones. Successive State Splitting (SSS) algorithm was also proposed for automatically obtaining a network structure of allophones. HMnet and SSS algorithm were extensively applied to speech recognition and other areas (e.g., protein analysis, automatic grammar acquisition, etc.). 初出: 鷹見淳一, 嵯峨山茂樹, ``逐次状態分割法(SSS)による隠れマルコフネットワークの自動生成,'' 日本音響学会平成3年度秋季研究発表会講演論文集, 2-5-13, pp. 73-74, Oct. 1991. (Jun-ichi Takami and Shigeki Sagayama, ``Automatic Generation of the Hidden Markov Network by the Successive State Splitting Algorithm,'' Proc. ASJ Fall Conf., 2-5-13, pp. 73-74, Oct. 1991.) コンテキスト依存HMM=LR 連続音声認識 (1996) Context-Dependent HMM-LR Continuous Speech Recognition Context-dependent HMM (HMnet) was combined with generalized LR parser for continuous speech recognition using a given context-free grammar (CFG). 初出: 永井明人, 鷹見淳一, 嵯峨山茂樹, ``環境依存連続 HMMを用いたHMM-LR連続音声認識,'' 日本音響学会平成3年度秋季研究発表会講演論文集, 1-5-20, pp. 39-40, Oct. 1991. 四階層結び構造HMM (1996) Four-layer Tied-structure HMM Tying at four levels: allophones, states, Gaussians, and scalar parameters. Advantageous in training with small amount and fast likelihood calculation. Presented in English at ICASSP95. 初出: 高橋敏, 嵯峨山茂樹, ``4 階層の共有構造を持つ音素環境依存 HMM の検討,'' 日本音響学会平成6年度秋期研究発表会講演論文集, 3-8-3, pp. 113-114, Oct. 1994. (Satoshi Takahashi, Shigeki Sagayama, ``A Study of Context-Dependent HMMs with Four-Level Tied Structure,'' Proc. ASJ Fall Conf., 3-8-3, pp. 113-114, Oct. 1994.) 離散混合HMM (1996) Discrete Mixture HMM Mixture components are replaced by discrete (scalar quantized) distributions to represent non-Gaussian, complex distributions. Presented in English at ICASSP97. 初出: 高橋敏, 嵯峨山茂樹, ``離散混合出力分布型HMM,'' 日本音響学会平成8年度秋期研究発表会講演論文集, -, pp. -, Sep. 1996. (Satoshi Takahashi, Shigeki Sagayama, ``Discrete Mixture Output Distribution HMM,'' Proc. ASJ Fall Conf., -, pp. -, Sep. 1996.) 非同期遷移HMM (1999) Asynchronous-Transition HMM Proposed a new HMM structure where state transitions are not synchronous between features. 初出: 松田繁樹, 中井満, 下平博, 嵯峨山茂樹, ``非同期遷移型HMM,'' 平成11年日本音響学会秋季研究発表会講演論文集, 1-1-12, pp. 23-24, Oct. 1999. 重回帰HMM (2000) Multiple Linear-Regression HMM Proposed a new HMM structure where mean vectors are linearly dependent on observable factors, such as pitch frequency and power. English paper at ICASSP2001. 初出: 藤永勝久, 中井満, 下平博, 嵯峨山茂樹, ``重回帰 HMMによる音声のモデル化,'' 平成12年電気関係学会北陸支部大会講演論文集, G-15, p. 458, Sep 2000. 話者・雑音へのモデル適応 Speaker & Noise Modeling & Adaptation ベクトル場平滑化法 (1992) Vector Field Smoothing Proposed a speaker adaptation method by spatially smoothing difference vectors between original and trained Gaussian mean vectors in the feature space. This was the first method enabling adapting the Gaussian mixture HMM to speaker. Extensively used in Japan. 初出: 大倉計美, 杉山雅英, 嵯峨山茂樹, ``混合連続分布HMM を用いた移動ベクトル場平滑化話者適応方式,'' 日本音響学会平成4年度春季研究発表会講演論文集, 2-Q-17, pp. 191-192, Mar. 1992. (Kazumi Ohkura, Masahide Sugiyama and Sigeki Sagayama, ``Speaker Adaptation Based on Transfer Vector Field Smoothing Model with Continuous Mixture Density HMMs,'' ASJ, 2-Q-17, pp. 191-192, (Mar. 1992). 話者混合法 (1992) Speaker-Tied Mixture Gaussian mixture is derived from speaker-dependent single Gaussian phone (allophone) models. Later, this model was used for rapid speaker adaptation where speaker mixture weights are adapted using an extremely small amount of training data (1 word, for example). 初出: 小坂哲夫, 鷹見淳一, 嵯峨山茂樹, ``話者混合SSSによる不特定話者音声認識,'' 日本音響学会平成4年度秋季研究発表会講演論文集, 2-5-9, pp. 135-136, Oct. 1992. (T. Kosaka, J. Takami and S. Sagayama, ``Speaker-Independent Speech Recognition Using Speaker-Mixture SSS algorithm,'' ASJ Fall Conf., 2-5-9, pp. 135-136, Oct. 1992. 話者木構造 (1993) Speaker Tree Applied tree-based clustering to speakers to find a speaker tree that spanned from a speaker-independent model to speaker-dependent models along the tree. 初出: 小坂哲夫, 松永昭一, 嵯峨山茂樹, ``木構造クラスタリングを用いた話者適応,'' 日本音響学会平成5年度秋期研究発表会講演論文集, 2-7-14, pp. 97-98, Oct. 1993. MAP/VFS話者適応法 (1994) MAP/VFS for Speaker Adaptation MAP (maximum a priori) training and VFS (vector field smoothing) are combined to accelerate speaker adaptation. 初出: 高橋淳一, 嵯峨山茂樹, ``最大事後確率推定と移動ベクトル場平滑化を組み合わせによる高速話者適応,'' 日本音響学会平成6年度秋期研究発表会講演論文集, 2-8-19, pp. 75-76, Oct. 1994. (Jun-ichi Takahashi, Shigeki Sagayama, ``Vector-Field-Smoothed Baysian Learning for Fast Speaker Adaptation,'' Proc. ASJ Fall Conf., 2-8-19, pp.75-76, Oct 1994.) ヤコビ適応法 (1996) Jacobian Adaptation Proposed a fast model adaptation method to the environmental noise. When adapting a model trained beforehand for noise A to the target noise B, and when A and B are relatively close, the noise adaptation procedure is linearized and can be very fast. This idea was first formulated at ATR, 1992, inspired by PMC by M. Gales. First experimental results were obtained in 1996 and presented in English at ICASSP97 (Munich). Extensive studies often seen in recent ICASSPs and ICSLPs. 初出: 山口義和, 高橋淳一, 高橋敏, 嵯峨山茂樹, ``Taylor展開に基づく高速な音響モデル適応法,'' 日本音響学会平成8年度秋季研究発表会講演論文集, -, pp. -, Sep. 1996. オンライン手書き文字認識 Hand-Writing Recognition HMMと構造化漢字辞書を用いたによるオンライン手書き文字認識 (2000) HMM-based Online Hand-Written Kanji-Character Recognition with Structured Lexicon Online hand-written Kanji character is recognized in the continuous speech recognition framework where a 6500-Kanji lexicon is hierarchically structured and represented by sequences of substrokes. This was the first application of continuous speech recognition algorithm to handwriting. Englsih papers at ICDAR2001, IWFHR2002, and ICPR2002. 初出: 秋良直人, 中井満, 下平博, 嵯峨山茂樹, ``ストロークHMMによるオンライン文字認識の特徴量の検討,'' 平成12年電気関係学会北陸支部大会講演論文集, F-92, p. 393, Sep 2000. 音楽情報処理 Music Information Processing HMMによるMIDI信号の楽譜化 (1999) HMM-based Music Transription from MIDI Signals The sequence of observed note durations (inter-onset time) was transcribed by an HMM-based note recognizer with a grammar probabilistically modeling the sequence of musical notes. Will be presented at MMSP2002 in English. Bibilography: 齋藤直樹, 中井満, 下平博, 嵯峨山茂樹, ``隠れマルコフモデルによる音楽演奏情報からの音符列推定,'' 平成11年電気関係学会北陸支部大会講演論文集, Oct. 1999. HMMによる旋律への自動和声づけ (1999) HMM-based Harmonization of Given Melodies Optimal chord finding was formulated as finding the Viterbi sequence of hidden state which generates the observed melody. With a bigram grammar of chord sequences, the decoding process estimates the most likely chord sequence in the maximum likelihood sense. 初出: 川上隆, 中井満, 下平博, 嵯峨山茂樹, ``隠れマルコフモデルを用いた旋律への和声付け,'' 平成11年電気関係学会北陸支部大会講演論文集, Oct. 1999. プロジェクト Projects 音声対話擬人化エージェントツールキット (2000-2002) Anthropomorphic Dialogue Agent Toolkit Involving 17 researchers from 10 organizations, a project for providing a open-source, free-of-charge toolkit is in progress for promoting the spoken dialog research. Funded by IPA (Information Processing Promotion Agency) for 3 years. 視覚障害者のための文字コミュニケーション (2000-2002) Written Communication for the Blind Combining hand-written character recognition, speech synthesis, and new transducers for written input (replacing the stylus), a project is in progress for supporting written communication (E-mailing etc.) of the blind. Collaborating with 7 organizations. Funded by Ishikawa Prefecture for 3 years.

主な研究成果

以下に、研究成果の主なものを示します。修正が必要な箇所があれば、

宛てにご指摘下さい。

音声分析と音声特徴量 Speech Analysis and Features

ラグ窓法 (1975) Lag Window
ラグ窓法はPARCOR分析においてピッチの影響を軽減するための手法として着想したものですが、そののちLSPも含め音声符号化などの標準化にも含まれ、広く用いられています。ほぼ世界中全ての携帯電話に用いられていると言ってよいでしょう。 [ Proposed lag windowing of autocorrelation to reduce the pitch effect in PARCOR speech analysis/synthesis. Jointly patented with Tohkura and Hashimoto. Commonly used in LPC-based speech coding standards. ]
初出: 特許01358638 ``音声分析装置'' (東倉、橋本、嵯峨山) (Japan Patent No. 01358638).
ケプストラム距離 (1978) Cepstrum Distance for Speech Recognition
ケプストラム(あるいはmel-cepstrum)は音声認識の特徴量として最も多く用いられていますが、ケプストラムを音声認識に最初に用いたのは誰であるか疑問に思っていました。おそらく我々(長島、好田、嵯峨山)であろう、と外国の方々も言われます。より正しい情報があれば、御教示下さい。 Used LPC-cepstrum distance for DP-based speech recognition. Collaborated with H. Nagashima. Cepstrum is widely used in speech recognition. (Perhaps, the first use of cepstral features in speech recognition though application to speaker recognition was existing.)
初出: 好田正紀, 長島広海, 嵯峨山茂樹, ``音韻単位の標準パターンを用いた単語音声認識装置,'' 電子通信学会全国大会予稿集, S9-4, Vol. 5, pp. 335-336, 1978.
デルタケプストラム (1979) Delta Cepstrum
現在の音声認識では、音声特徴量としてほぼ例外なくデルタ特徴量(ケプルトラムに対してデルタケプルトラム)が使われますが、このような動的特徴量を最初に提唱しました。そのときの目的は話者特性の解析でした。後年、この特徴量を古井(当時は嵯峨山の指導者)が不特定話者音声認識に使用し大きな効果を実証しました。 [ Proposed delta cepstrum for capturing the dynamic characteristics of speech. Primarily for speaker recognition. This feature was later succesfully applied to DP-based speaker-independent speech recognition by S. Furui. With a name of ``delta-cepstrum'', extensively used in modern HMM-based speech recognition. ]
初出: 嵯峨山茂樹, 板倉文忠, ``音声の動的尺度に含まれる個人性情報,'' 日本音響学会昭和54年度春季研究発表会講演論文集, 3-2-7, pp. 589-590 (1979-06). (Shigei Sagayama and Fumitada Itakura, ``On Individuality in a Dynamic Measure of Speech,'' Proc. ASJ Spring Spring Conf. 1979, 3-2-7, pp. 589-590, June 1979.)
線スペクトル分析理論 (1979-) LSP Analysis Theory
線スペクトル分析は板倉が着想し、現在はほぼすべての携帯電話で使われている効率の高い音声情報圧縮方式ですが、その理論について板倉・嵯峨山が研究し、その理論的な性質を明らかにしました。 [ Collaborating with Itakura, produced a number of research results about theoretical properties of LSP, e.g., Duality theory of LPC and LSP, Theoretical properties of LSP frequency distributions, etc. Ph.D. thesis. ]

音声分析と音声合成 Speech Analysis and Synthesis

複合正弦波モデル化 (1979) Composite Sinusoidal Modeling (CSM)
Proposed modeling speech by a sum of $n$ sinusoids and equate autocorrelations of the speech signal and the model at lowest $2n-1$ points. The model frequencies were proven equivalent to Line Spectrum Pair frequencies. Applied to Yamaha's best selling sound IC, the CSM Speech Synthesis patent earned best among all NTT patents for several years.
初出: 嵯峨山茂樹, 板倉文忠, ``複合正弦波モデルによる音声分析,'' 電子情報通信学会情報・システム部門別全国大会予稿集, 63, p. 63, 1979.; 嵯峨山茂樹, 板倉文忠, ``複合正弦波による簡易な音声合成法,'' 日本音響学会昭和54年度秋季研究発表会講演論文集, 3-2-3, pp. 557-558, Oct. 1979.
LSP音声合成LSI (1980) LSP Speech Synthesizer LSI
Designed LSP Speech Synthesizer LSI collaborating with the Fujitsu's telephone switching hardware team. It became the first LSI for LSP speech synthesis and also the first C-MOS LSI for speech synthesis.
初出: 嵯峨山茂樹, 管村昇, 板倉文忠, 小池恒彦, ``線スペクトル対パラメータによる音声合成器LSIとその応用,'' 電子通信学会通信部門全国大会予稿集, S5-5, pp. 2-393-394, 1980.
日本語テキスト音声合成 (1982) Japanese Text-to-Speech System
Collaborated with H. Sato, Y. Sagisaka, and K. Kogure to construct the first Japanese text-to-speech system.
初出: 佐藤大和, 匂坂芳典, 小暮潔, 嵯峨山茂樹, ``日本語テキストからの音声合成,'' 電子情報通信学会全国大会予稿集, S6-3, Vol. 5, pp. 399-400, 1982.

音素モデル Phone Modeling

木構造の異音クラスタリング (1987) Tree-based Allophone Clustering
Proposed tree-based clustering technique for context-dependent phones. After presented in English in 1989 (IASSP89, Glasgow), this idea was adopted and modified by Kai-Fu Lee et al. in 1990, and extensively used by IBM and Cambridge University. Commonly used in modern phoneme-based high-performance speech recognition.
初出: 嵯峨山茂樹, ``音素環境のクラスタリング,'' 音学講論, 1-5-15, pp. 29-30, Oct. 1987. (Sagayama, S. (1987). "Phoneme Environment Clustering," 1-5-15, Proc. ASJ Fall Conference, pp. 29-30, Oct. 1987.
隠れマルコフ網 (1991) Hidden Markov Network
Combined the above-stated tree-based clustering and state tying ideas to represent context-dependent phones. Successive State Splitting (SSS) algorithm was also proposed for automatically obtaining a network structure of allophones. HMnet and SSS algorithm were extensively applied to speech recognition and other areas (e.g., protein analysis, automatic grammar acquisition, etc.).
初出: 鷹見淳一, 嵯峨山茂樹, ``逐次状態分割法(SSS)による隠れマルコフネットワークの自動生成,'' 日本音響学会平成3年度秋季研究発表会講演論文集, 2-5-13, pp. 73-74, Oct. 1991. (Jun-ichi Takami and Shigeki Sagayama, ``Automatic Generation of the Hidden Markov Network by the Successive State Splitting Algorithm,'' Proc. ASJ Fall Conf., 2-5-13, pp. 73-74, Oct. 1991.)
コンテキスト依存HMM=LR 連続音声認識 (1996) Context-Dependent HMM-LR Continuous Speech Recognition
Context-dependent HMM (HMnet) was combined with generalized LR parser for continuous speech recognition using a given context-free grammar (CFG).
初出: 永井明人, 鷹見淳一, 嵯峨山茂樹, ``環境依存連続 HMMを用いたHMM-LR連続音声認識,'' 日本音響学会平成3年度秋季研究発表会講演論文集, 1-5-20, pp. 39-40, Oct. 1991.
四階層結び構造HMM (1996) Four-layer Tied-structure HMM
Tying at four levels: allophones, states, Gaussians, and scalar parameters. Advantageous in training with small amount and fast likelihood calculation. Presented in English at ICASSP95.
初出: 高橋敏, 嵯峨山茂樹, ``4 階層の共有構造を持つ音素環境依存 HMM の検討,'' 日本音響学会平成6年度秋期研究発表会講演論文集, 3-8-3, pp. 113-114, Oct. 1994. (Satoshi Takahashi, Shigeki Sagayama, ``A Study of Context-Dependent HMMs with Four-Level Tied Structure,'' Proc. ASJ Fall Conf., 3-8-3, pp. 113-114, Oct. 1994.)
離散混合HMM (1996) Discrete Mixture HMM
Mixture components are replaced by discrete (scalar quantized) distributions to represent non-Gaussian, complex distributions. Presented in English at ICASSP97.
初出: 高橋敏, 嵯峨山茂樹, ``離散混合出力分布型HMM,'' 日本音響学会平成8年度秋期研究発表会講演論文集, -, pp. -, Sep. 1996. (Satoshi Takahashi, Shigeki Sagayama, ``Discrete Mixture Output Distribution HMM,'' Proc. ASJ Fall Conf., -, pp. -, Sep. 1996.)
非同期遷移HMM (1999) Asynchronous-Transition HMM
Proposed a new HMM structure where state transitions are not synchronous between features.
初出: 松田繁樹, 中井満, 下平博, 嵯峨山茂樹, ``非同期遷移型HMM,'' 平成11年日本音響学会秋季研究発表会講演論文集, 1-1-12, pp. 23-24, Oct. 1999.
重回帰HMM (2000) Multiple Linear-Regression HMM
Proposed a new HMM structure where mean vectors are linearly dependent on observable factors, such as pitch frequency and power. English paper at ICASSP2001.
初出: 藤永勝久, 中井満, 下平博, 嵯峨山茂樹, ``重回帰 HMMによる音声のモデル化,'' 平成12年電気関係学会北陸支部大会講演論文集, G-15, p. 458, Sep 2000.

話者・雑音へのモデル適応 Speaker & Noise Modeling & Adaptation

ベクトル場平滑化法 (1992) Vector Field Smoothing
Proposed a speaker adaptation method by spatially smoothing difference vectors between original and trained Gaussian mean vectors in the feature space. This was the first method enabling adapting the Gaussian mixture HMM to speaker. Extensively used in Japan.
初出: 大倉計美, 杉山雅英, 嵯峨山茂樹, ``混合連続分布HMM を用いた移動ベクトル場平滑化話者適応方式,'' 日本音響学会平成4年度春季研究発表会講演論文集, 2-Q-17, pp. 191-192, Mar. 1992. (Kazumi Ohkura, Masahide Sugiyama and Sigeki Sagayama, ``Speaker Adaptation Based on Transfer Vector Field Smoothing Model with Continuous Mixture Density HMMs,'' ASJ, 2-Q-17, pp. 191-192, (Mar. 1992).
話者混合法 (1992) Speaker-Tied Mixture
Gaussian mixture is derived from speaker-dependent single Gaussian phone (allophone) models. Later, this model was used for rapid speaker adaptation where speaker mixture weights are adapted using an extremely small amount of training data (1 word, for example).
初出: 小坂哲夫, 鷹見淳一, 嵯峨山茂樹, ``話者混合SSSによる不特定話者音声認識,'' 日本音響学会平成4年度秋季研究発表会講演論文集, 2-5-9, pp. 135-136, Oct. 1992. (T. Kosaka, J. Takami and S. Sagayama, ``Speaker-Independent Speech Recognition Using Speaker-Mixture SSS algorithm,'' ASJ Fall Conf., 2-5-9, pp. 135-136, Oct. 1992.
話者木構造 (1993) Speaker Tree
Applied tree-based clustering to speakers to find a speaker tree that spanned from a speaker-independent model to speaker-dependent models along the tree.
初出: 小坂哲夫, 松永昭一, 嵯峨山茂樹, ``木構造クラスタリングを用いた話者適応,'' 日本音響学会平成5年度秋期研究発表会講演論文集, 2-7-14, pp. 97-98, Oct. 1993.
MAP/VFS話者適応法 (1994) MAP/VFS for Speaker Adaptation
MAP (maximum a priori) training and VFS (vector field smoothing) are combined to accelerate speaker adaptation.
初出: 高橋淳一, 嵯峨山茂樹, ``最大事後確率推定と移動ベクトル場平滑化を組み合わせによる高速話者適応,'' 日本音響学会平成6年度秋期研究発表会講演論文集, 2-8-19, pp. 75-76, Oct. 1994. (Jun-ichi Takahashi, Shigeki Sagayama, ``Vector-Field-Smoothed Baysian Learning for Fast Speaker Adaptation,'' Proc. ASJ Fall Conf., 2-8-19, pp.75-76, Oct 1994.)
ヤコビ適応法 (1996) Jacobian Adaptation
Proposed a fast model adaptation method to the environmental noise. When adapting a model trained beforehand for noise A to the target noise B, and when A and B are relatively close, the noise adaptation procedure is linearized and can be very fast. This idea was first formulated at ATR, 1992, inspired by PMC by M. Gales. First experimental results were obtained in 1996 and presented in English at ICASSP97 (Munich). Extensive studies often seen in recent ICASSPs and ICSLPs.
初出: 山口義和, 高橋淳一, 高橋敏, 嵯峨山茂樹, ``Taylor展開に基づく高速な音響モデル適応法,'' 日本音響学会平成8年度秋季研究発表会講演論文集, -, pp. -, Sep. 1996.

オンライン手書き文字認識 Hand-Writing Recognition

HMMと構造化漢字辞書を用いたによるオンライン手書き文字認識 (2000) HMM-based Online Hand-Written Kanji-Character Recognition with Structured Lexicon
Online hand-written Kanji character is recognized in the continuous speech recognition framework where a 6500-Kanji lexicon is hierarchically structured and represented by sequences of substrokes. This was the first application of continuous speech recognition algorithm to handwriting. Englsih papers at ICDAR2001, IWFHR2002, and ICPR2002.
初出: 秋良直人, 中井満, 下平博, 嵯峨山茂樹, ``ストロークHMMによるオンライン文字認識の特徴量の検討,'' 平成12年電気関係学会北陸支部大会講演論文集, F-92, p. 393, Sep 2000.

音楽情報処理 Music Information Processing

HMMによるMIDI信号の楽譜化 (1999) HMM-based Music Transription from MIDI Signals
The sequence of observed note durations (inter-onset time) was transcribed by an HMM-based note recognizer with a grammar probabilistically modeling the sequence of musical notes. Will be presented at MMSP2002 in English.
Bibilography: 齋藤直樹, 中井満, 下平博, 嵯峨山茂樹, ``隠れマルコフモデルによる音楽演奏情報からの音符列推定,'' 平成11年電気関係学会北陸支部大会講演論文集, Oct. 1999.
HMMによる旋律への自動和声づけ (1999) HMM-based Harmonization of Given Melodies
Optimal chord finding was formulated as finding the Viterbi sequence of hidden state which generates the observed melody. With a bigram grammar of chord sequences, the decoding process estimates the most likely chord sequence in the maximum likelihood sense.
初出: 川上隆, 中井満, 下平博, 嵯峨山茂樹, ``隠れマルコフモデルを用いた旋律への和声付け,'' 平成11年電気関係学会北陸支部大会講演論文集, Oct. 1999.

プロジェクト Projects

音声対話擬人化エージェントツールキット (2000-2002) Anthropomorphic Dialogue Agent Toolkit
Involving 17 researchers from 10 organizations, a project for providing a open-source, free-of-charge toolkit is in progress for promoting the spoken dialog research. Funded by IPA (Information Processing Promotion Agency) for 3 years.
視覚障害者のための文字コミュニケーション (2000-2002) Written Communication for the Blind
Combining hand-written character recognition, speech synthesis, and new transducers for written input (replacing the stylus), a project is in progress for supporting written communication (E-mailing etc.) of the blind. Collaborating with 7 organizations. Funded by Ishikawa Prefecture for 3 years.

嵯峨山トップページへ