Area | Description |
Speech and Spoken Language Processing |
Speech recognition and synthesis, and language modeling, with an emphasis
on noisy/reverberant speech recognition, and waveform-model-based
speech synthesis. Speech recognition involves a wide range of research
areas covering acoustic modeling based on Hidden Markov Model (HMM),
language modeling, efficient search algorithms, adaptation to speaker
and environment (noise, channel), syntax analysis, prosodic features,
learning machines, statistical signal processing, probabilistic models,
stochastic language modeling, neural networks, etc.
|
Signal Processing |
Signal separation, microphone array signal processing, multi-channel
signal compression for MPEG4, etc. More than 30 years of research
experience in the speech signal processing area such as basic speech
spectrum modeling and analysis including linear predictive coding
(LPC), partial autocorrelation (PARCOR), line spectrum pair (LSP), and
composite sinusoidal modeling (CSM) as well as applications to speech
analysis, speech coding, speech analysis-synthesis (vocoder), speech
enhancement, speech manipulation, and other topics including pitch
detection, voice/sound discrimination, and other problems.
|
Music Signal/Information Processing |
Automatic transcription based on probabilistic modeling and spectrum
analysis, multipitch spectrum analysis, signal-to-MIDI conversion,
computational auditory scene analysis, tempo analysis, rhythm
recognition, key finding (tonality recognition), music instrument
recognition, automatic harmonization (chord fitting) to the given
melody, and automatic counterpoint generation. One of final goals is
automatic rendering of given music scores.
|
Hand-Written Character Recognition |
Online hand-written Kanji character and mathematical formulae
recognition from the continuous speech recognition approach, i.e.,
based on substroke HMM and Kanji syntax. This new research area
involves issues such as Kanji syntax, features, model unit selection,
context-dependent unit modeling, hybrid modeling for Kanji and
Hirakana, stroke order modeling, efficient search, etc. Led a large
project for "text-based communication for the blind" supported by
Ishikawa Prefecture.
|
Human Interface |
Anthropomorphic spoken dialogue agent equipped with speech recognition,
speech synthesis and face animation. A large project supported by IPA
(Information Processing Promotion Association) is in progress. Members
from more than ten universities and public institutes are collaborating
together to produce a license-free open-source software kit for
anthropomorphic spoken dialogue agent. The ultimate goal is
"artificial personality".
See the Galatea Toolkit pages.
|
Multimedia Information Processing |
Discriminating voice-active parts from background noise and music, for
example, is a challenge for compression of video data and search for
meaningful parts.
|