Prof. Dr. Shigeki Sagayama, University of Tokyo

Area	Description
Speech and Spoken Language Processing	Speech recognition and synthesis, and language modeling, with an emphasis on noisy/reverberant speech recognition, and waveform-model-based speech synthesis. Speech recognition involves a wide range of research areas covering acoustic modeling based on Hidden Markov Model (HMM), language modeling, efficient search algorithms, adaptation to speaker and environment (noise, channel), syntax analysis, prosodic features, learning machines, statistical signal processing, probabilistic models, stochastic language modeling, neural networks, etc.
Signal Processing	Signal separation, microphone array signal processing, multi-channel signal compression for MPEG4, etc. More than 30 years of research experience in the speech signal processing area such as basic speech spectrum modeling and analysis including linear predictive coding (LPC), partial autocorrelation (PARCOR), line spectrum pair (LSP), and composite sinusoidal modeling (CSM) as well as applications to speech analysis, speech coding, speech analysis-synthesis (vocoder), speech enhancement, speech manipulation, and other topics including pitch detection, voice/sound discrimination, and other problems.
Music Signal/Information Processing	Automatic transcription based on probabilistic modeling and spectrum analysis, multipitch spectrum analysis, signal-to-MIDI conversion, computational auditory scene analysis, tempo analysis, rhythm recognition, key finding (tonality recognition), music instrument recognition, automatic harmonization (chord fitting) to the given melody, and automatic counterpoint generation. One of final goals is automatic rendering of given music scores.
Hand-Written Character Recognition	Online hand-written Kanji character and mathematical formulae recognition from the continuous speech recognition approach, i.e., based on substroke HMM and Kanji syntax. This new research area involves issues such as Kanji syntax, features, model unit selection, context-dependent unit modeling, hybrid modeling for Kanji and Hirakana, stroke order modeling, efficient search, etc. Led a large project for "text-based communication for the blind" supported by Ishikawa Prefecture.
Human Interface	Anthropomorphic spoken dialogue agent equipped with speech recognition, speech synthesis and face animation. A large project supported by IPA (Information Processing Promotion Association) is in progress. Members from more than ten universities and public institutes are collaborating together to produce a license-free open-source software kit for anthropomorphic spoken dialogue agent. The ultimate goal is "artificial personality". See the Galatea Toolkit pages.
Multimedia Information Processing	Discriminating voice-active parts from background noise and music, for example, is a challenge for compression of video data and search for meaningful parts.

Back to the top