以下に、研究成果の主なものを示します。
修正が必要な箇所があれば、宛てにご指摘下さい。
音声分析と音声特徴量 Speech Analysis and Features
- ラグ窓法 (1975) Lag Window
ラグ窓法はPARCOR分析においてピッチの影響を軽減するための手法と
して着想したものですが、そののちLSPも含め音声符号化などの標準化にも
含まれ、広く用いられています。
ほぼ世界中全ての携帯電話に用いられていると言ってよいでしょう。
[ Proposed lag windowing of autocorrelation to reduce the pitch
effect in PARCOR speech analysis/synthesis.
Jointly patented with Tohkura and Hashimoto.
Commonly used in LPC-based speech coding standards. ]
初出: 特許01358638 ``音声分析装置'' (東倉、橋本、嵯峨山)
(Japan Patent No. 01358638).
- ケプストラム距離 (1978) Cepstrum Distance for Speech Recognition
ケプストラム(あるいはmel-cepstrum)は音声認識の特徴量として最も多く
用いられていますが、ケプストラムを音声認識に最初に用いたのは
誰であるか疑問に思っていました。おそらく我々(長島、好田、嵯峨山)であろう、
と外国の方々も言われます。より正しい情報があれば、御教示下さい。
Used LPC-cepstrum distance for DP-based speech recognition.
Collaborated with H. Nagashima. Cepstrum is widely used in speech
recognition. (Perhaps, the first use of cepstral features in
speech recognition though application to speaker recognition was
existing.)
初出: 好田 正紀, 長島 広海, 嵯峨山 茂樹, ``音韻単位の標準パ
ターンを用いた単 語音声認識装置,'' 電子通信学会全国大会予稿集,
S9-4, Vol. 5, pp. 335-336, 1978.
- デルタケプストラム (1979) Delta Cepstrum
現在の音声認識では、音声特徴量としてほぼ例外なくデルタ特徴量(ケプル
トラムに対してデルタケプルトラム)が使われますが、このような動的特徴
量を最初に提唱しました。そのときの目的は話者特性の解析でした。
後年、この特徴量を古井(当時は嵯峨山の指導者)が不特定話者音声認識に
使用し大きな効果を実証しました。
[ Proposed delta cepstrum for capturing the dynamic characteristics
of speech. Primarily for speaker recognition.
This feature was later succesfully applied
to DP-based speaker-independent speech recognition by S. Furui.
With a name of ``delta-cepstrum'', extensively used in modern HMM-based
speech recognition. ]
初出: 嵯峨山 茂樹, 板倉 文忠, ``音声の動的尺度に含まれる個
人性情報,'' 日本音 響学会昭和54年度春季研究発表会講演論文集, 3-2-7,
pp. 589-590 (1979-06). (Shigei Sagayama and Fumitada Itakura, ``On
Individuality in a Dynamic Measure of Speech,'' Proc. ASJ Spring
Spring Conf. 1979, 3-2-7, pp. 589-590, June 1979.)
- 線スペクトル分析理論 (1979-) LSP Analysis Theory
線スペクトル分析は板倉が着想し、現在はほぼすべての携帯電話で使われ
ている効率の高い音声情報圧縮方式ですが、その理論について
板倉・嵯峨山が研究し、その理論的な性質を明らかにしました。
[ Collaborating with Itakura, produced a number of research results
about theoretical properties of LSP, e.g., Duality theory of LPC
and LSP, Theoretical properties of LSP frequency distributions,
etc. Ph.D. thesis. ]
音声分析と音声合成 Speech Analysis and Synthesis
- 複合正弦波モデル化 (1979) Composite Sinusoidal Modeling (CSM)
Proposed modeling speech by a sum of $n$ sinusoids and equate
autocorrelations of the speech signal and the model at lowest
$2n-1$ points. The model frequencies were proven equivalent to
Line Spectrum Pair frequencies. Applied to Yamaha's best selling
sound IC, the CSM Speech Synthesis patent earned best among all NTT
patents for several years.
初出: 嵯峨山 茂樹, 板倉 文忠, ``複合正弦波モデルによる音声
分析,'' 電子情報通 信学会 情報・システム部門別 全国大会予稿集, 63,
p. 63, 1979.; 嵯峨山 茂樹, 板倉 文忠, ``複合正弦波による簡易な音声
合成法,'' 日本音響 学会昭和54年度秋季研究発表会講演論文集, 3-2-3,
pp. 557-558, Oct. 1979.
- LSP音声合成LSI (1980) LSP Speech Synthesizer LSI
Designed LSP Speech Synthesizer LSI collaborating with the Fujitsu's
telephone switching hardware team. It became the first LSI for LSP speech
synthesis and also the first C-MOS LSI for speech synthesis.
初出: 嵯峨山 茂樹, 管村 昇, 板倉 文忠, 小池 恒彦, ``線スペ
クトル対パラメータ による音声合成器LSIとその応用,'' 電子通信学会 通
信部門 全国大会予稿集, S5-5, pp. 2-393-394, 1980.
- 日本語テキスト音声合成 (1982) Japanese Text-to-Speech System
Collaborated with H. Sato, Y. Sagisaka, and K. Kogure to construct
the first Japanese text-to-speech system.
初出: 佐藤 大和, 匂坂 芳典, 小暮 潔, 嵯峨山 茂樹, ``日本語
テキストからの音声 合成,'' 電子情報通信学会全国大会予稿集, S6-3,
Vol. 5, pp. 399-400, 1982.
音素モデル Phone Modeling
- 木構造の異音クラスタリング (1987) Tree-based Allophone Clustering
Proposed tree-based clustering technique for context-dependent
phones. After presented in English in 1989 (IASSP89, Glasgow),
this idea was adopted and modified by Kai-Fu Lee et al. in 1990, and
extensively used by IBM and Cambridge University. Commonly used in
modern phoneme-based high-performance speech recognition.
初出: 嵯峨山 茂樹, ``音素環境のクラスタリング,'' 音学講論,
1-5-15, pp. 29-30, Oct. 1987. (Sagayama, S. (1987). "Phoneme
Environment Clustering," 1-5-15, Proc. ASJ Fall Conference,
pp. 29-30, Oct. 1987.
- 隠れマルコフ網 (1991) Hidden Markov Network
Combined the above-stated tree-based clustering and state tying
ideas to represent context-dependent phones. Successive
State Splitting (SSS) algorithm was also proposed for automatically
obtaining a network structure of allophones. HMnet and SSS
algorithm were extensively applied to speech recognition and other
areas (e.g., protein analysis, automatic grammar acquisition,
etc.).
初出: 鷹見 淳一, 嵯峨山 茂樹, ``逐次状態分割法(SSS)による隠れ
マルコフネット ワークの自動生成,'' 日本音響学会平成3年度秋季研究発
表会講演論文集, 2-5-13, pp. 73-74, Oct. 1991. (Jun-ichi Takami and
Shigeki Sagayama, ``Automatic Generation of the Hidden Markov
Network by the Successive State Splitting Algorithm,'' Proc. ASJ
Fall Conf., 2-5-13, pp. 73-74, Oct. 1991.)
- コンテキスト依存HMM=LR 連続音声認識 (1996)
Context-Dependent HMM-LR Continuous Speech Recognition
Context-dependent HMM (HMnet) was combined with generalized LR
parser for continuous speech recognition using a given context-free
grammar (CFG).
初出: 永井 明人, 鷹見 淳一, 嵯峨山 茂樹, ``環境依存連続
HMMを用いたHMM-LR連続 音声認識,'' 日本音響学会平成3年度秋季研究発表
会講演論文集, 1-5-20, pp. 39-40, Oct. 1991.
- 四階層結び構造HMM (1996) Four-layer Tied-structure HMM
Tying at four levels: allophones, states, Gaussians, and scalar
parameters. Advantageous in training with small amount and fast
likelihood calculation. Presented in English at ICASSP95.
初出: 高橋 敏, 嵯峨山 茂樹, ``4 階層の共有構造を持つ音素
環境依存 HMM の検討,'' 日本音響学会平成6年度秋期研究発表会講演論文集,
3-8-3, pp. 113-114, Oct. 1994. (Satoshi Takahashi, Shigeki
Sagayama, ``A Study of Context-Dependent HMMs with Four-Level Tied
Structure,'' Proc. ASJ Fall Conf., 3-8-3, pp. 113-114, Oct. 1994.)
- 離散混合HMM (1996) Discrete Mixture HMM
Mixture components are replaced by discrete (scalar quantized)
distributions to represent non-Gaussian, complex distributions.
Presented in English at ICASSP97.
初出: 高橋 敏, 嵯峨山 茂樹, ``離散混合出力分布型HMM,'' 日本
音響学会平成8年度 秋期研究発表会講演論文集, -, pp. -, Sep. 1996.
(Satoshi Takahashi, Shigeki Sagayama, ``Discrete Mixture Output
Distribution HMM,'' Proc. ASJ Fall Conf., -, pp. -, Sep. 1996.)
- 非同期遷移HMM (1999) Asynchronous-Transition HMM
Proposed a new HMM structure where state transitions are not
synchronous between features.
初出: 松田繁樹, 中井満, 下平博, 嵯峨山茂樹, ``非同期遷移
型HMM,'' 平成11年日本音響学会秋季研究発表会講演論文集, 1-1-12,
pp. 23-24, Oct. 1999.
- 重回帰HMM (2000) Multiple Linear-Regression HMM
Proposed a new HMM structure where mean vectors are linearly
dependent on observable factors, such as pitch frequency and power.
English paper at ICASSP2001.
初出: 藤永 勝久, 中井 満, 下平 博, 嵯峨山 茂樹, ``重回帰
HMMによる音声のモデ ル化,'' 平成12年電気関係学会北陸支部大会講演論
文集, G-15, p. 458, Sep 2000.
話者・雑音へのモデル適応 Speaker & Noise Modeling & Adaptation
- ベクトル場平滑化法 (1992) Vector Field Smoothing
Proposed a speaker adaptation method by spatially smoothing
difference vectors between original and trained Gaussian mean
vectors in the feature space. This was the first method enabling
adapting the Gaussian mixture HMM to speaker. Extensively used in
Japan.
初出: 大倉 計美, 杉山 雅英, 嵯峨山 茂樹, ``混合連続分布HMM
を用いた移動ベクトル場平滑化話者適応方式,'' 日本音響学会平成4年度春
季研究発表会講演論文集, 2-Q-17, pp. 191-192, Mar. 1992. (Kazumi Ohkura,
Masahide Sugiyama and Sigeki Sagayama, ``Speaker Adaptation Based
on Transfer Vector Field Smoothing Model with Continuous Mixture
Density HMMs,'' ASJ, 2-Q-17, pp. 191-192, (Mar. 1992).
- 話者混合法 (1992) Speaker-Tied Mixture
Gaussian mixture is derived from speaker-dependent single Gaussian
phone (allophone) models. Later, this model was used for rapid
speaker adaptation where speaker mixture weights are adapted using
an extremely small amount of training data (1 word, for example).
初出:
小坂 哲夫, 鷹見 淳一, 嵯峨山 茂樹, ``話者混合SSSによる不特定話者音
声認識,'' 日本音響学会平成4年度秋季研究発表会講演論文集,
2-5-9, pp. 135-136, Oct. 1992. (T. Kosaka, J. Takami and
S. Sagayama, ``Speaker-Independent Speech Recognition Using
Speaker-Mixture SSS algorithm,'' ASJ Fall Conf., 2-5-9,
pp. 135-136, Oct. 1992.
- 話者木構造 (1993) Speaker Tree
Applied tree-based clustering to speakers to find a speaker tree
that spanned from a speaker-independent model to speaker-dependent
models along the tree.
初出: 小坂 哲夫, 松永 昭一, 嵯峨山 茂樹, ``木構造クラスタリ
ングを用いた話者 適応,'' 日本音響学会平成5年度秋期研究発表会講演論
文集, 2-7-14, pp. 97-98, Oct. 1993.
- MAP/VFS話者適応法 (1994) MAP/VFS for Speaker Adaptation
MAP (maximum a priori) training and VFS (vector field smoothing)
are combined to accelerate speaker adaptation.
初出: 高橋 淳一, 嵯峨山 茂樹, ``最大事後確率推定と移動ベク
トル場平滑化を組み 合わせによる高速話者適応,'' 日本音響学会平成6年
度秋期研究発表会講演論 文集, 2-8-19, pp. 75-76,
Oct. 1994. (Jun-ichi Takahashi, Shigeki Sagayama,
``Vector-Field-Smoothed Baysian Learning for Fast Speaker
Adaptation,'' Proc. ASJ Fall Conf., 2-8-19, pp.75-76, Oct 1994.)
- ヤコビ適応法 (1996) Jacobian Adaptation
Proposed a fast model adaptation method to the environmental noise.
When adapting a model trained beforehand for noise A to the target
noise B, and when A and B are relatively close, the noise
adaptation procedure is linearized and can be very fast. This idea
was first formulated at ATR, 1992, inspired by PMC by M. Gales.
First experimental results were obtained in 1996 and
presented in English at ICASSP97 (Munich). Extensive studies often
seen in recent ICASSPs and ICSLPs.
初出: 山口義和, 高橋淳一, 高橋敏, 嵯峨山茂樹, ``Taylor展
開に基づく高速な音響モデル適応法,'' 日本音響学会平成8年度秋
季研究発表会講演論文集, -, pp. -, Sep. 1996.
オンライン手書き文字認識 Hand-Writing Recognition
- HMMと構造化漢字辞書を用いたによるオンライン手書き文字認識 (2000)
HMM-based Online Hand-Written Kanji-Character Recognition with
Structured Lexicon
Online hand-written Kanji character is recognized in the continuous
speech recognition framework where a 6500-Kanji lexicon is
hierarchically structured and represented by sequences of
substrokes. This was the first application of continuous speech
recognition algorithm to handwriting.
Englsih papers at ICDAR2001, IWFHR2002, and ICPR2002.
初出: 秋良 直人, 中井 満, 下平 博, 嵯峨山 茂樹, ``ストロー
クHMMによるオンラ イン文字認識の特徴量の検討,'' 平成12年電気関係学
会北陸支部大会講演論文 集, F-92, p. 393, Sep 2000.
音楽情報処理 Music Information Processing
- HMMによるMIDI信号の楽譜化 (1999) HMM-based Music Transription from MIDI Signals
The sequence of observed note durations (inter-onset time) was
transcribed by an HMM-based note recognizer with a grammar
probabilistically modeling the sequence of musical notes.
Will be presented at MMSP2002 in English.
Bibilography: 齋藤直樹, 中井満, 下平博, 嵯峨山茂樹, ``隠れマルコフ
モデルによる音楽演奏情報からの音符列推定,'' 平成11年電気関係学会北
陸支部大会講演論文集, Oct. 1999.
- HMMによる旋律への自動和声づけ (1999) HMM-based Harmonization of Given Melodies
Optimal chord finding was formulated as finding the Viterbi
sequence of hidden state which generates the observed melody. With
a bigram grammar of chord sequences, the decoding process
estimates the most likely chord sequence in the maximum likelihood
sense.
初出: 川上隆, 中井満, 下平博, 嵯峨山茂樹, ``隠れマルコフモ
デルを用いた旋律への和声付け,'' 平成11年電気関係学会北陸支部大会講
演論文集, Oct. 1999.
プロジェクト Projects
- 音声対話擬人化エージェントツールキット (2000-2002) Anthropomorphic Dialogue Agent Toolkit
Involving 17 researchers from 10 organizations,
a project for providing a open-source, free-of-charge toolkit is in
progress for promoting the spoken dialog research. Funded by IPA
(Information Processing Promotion Agency) for 3 years.
- 視覚障害者のための文字コミュニケーション (2000-2002) Written Communication for the Blind
Combining hand-written character recognition, speech synthesis, and
new transducers for written input (replacing the stylus), a project
is in progress for supporting written communication (E-mailing
etc.) of the blind. Collaborating with 7 organizations.
Funded by Ishikawa Prefecture for 3 years.
|