Sagayama/Ono Lab: Research Areas

Research Projects at Sagayama/Ono Laboratory

At Lab #1 (Sagayama/Ono Lab), we are interested in three methodologies: (1) signal processing, (2) probabilistic modeling, and (3) human interface with applications in five major areas: (A) speech, (B) music, (C) acoustic signals, (D) handwriting and (E) images.

Area	Topics
Speech	Speech recognition/synthesis/coding, spoken language processing, spoken dialog understanding Robust speech recognition in adverse conditions (noise, channel, reverberation, etc.) Noise model composition with Lagrange polynomial approximation Jacobian adaptation to noisy environment Reverbarant acoustic modeling Acoustic model structure for speech recognition Likelihood compensation for unexpected noise Asyncronous-transition HMM Linear regression HMM Tree-based clustering for phoneme environments Adaptation to speaker and environment Vector Field Smoothing (VFS) for speaker adaptation Tree-structured speaker modeling Automatic speech signal detection Prosodic features for speech understanding Speech synthesis, singing speech synthesis, speech analysis
Music	Music signal/information processing Multipitch detection, signal separation, transform to MIDI format Harmonic clustering for polyphonic audio signals, octave estimation, estimation of number of sources [ Details and Demos ] "Specmurt Anasylis" for polyphonic audio signals, iterative estimation [ Details and Demos ] CASA (computational auditory scene analysis) Automatic rhythm pattern recognition HMM-based rhythm recognition Tempo analysis based on segmental polynomial modeling Automatic harmonization (chord generation) for given melodies HMM-based harmony generation Automatic harmony analysis Grammar-based approach to harmony analysis Automatic counterpoint Dynamic programming for three(or more)-voice counter point Planed: automatic music composition, sheet music recognition, automatic music rendering
Human Interface	Human interface, digital-human systems Anthropomorphic spoken dialog agent (Life-like animated agent, digital human), animated images for spoken dialogue "Galatea" project, Galatea toolkit Dialog description language (VoiceXML) and its interpreter human interface for the handicapped
Signal Processing	Acoustic signal processing Microphone array signal processing Complex Spectrum Circle Centroid (CSCC) method for sound source separation Optimal microphone allocation Multichannel Signal coding Tree-structured lossless coding of multichannel signals for MPEG4 Room acoustics measurement using M-sequence modulation
Hand-writing	Hand-written character recognition Structural approach to hand-written character recognition Online hand-written character recognition using continuous speech recognition algorithms HMM-based stroke modeling and Kanji-structure grammar Rapid adaptation to the writer Maximum A Priori method for adaptation to the writer Mathematical formula recognition (generating TeX codes)

Page created by Shigeki Sagayama