Research Projects at Sagayama/Ono Laboratory    

At Lab #1 (Sagayama/Ono Lab), we are interested in three methodologies: (1) signal processing, (2) probabilistic modeling, and (3) human interface with applications in five major areas: (A) speech, (B) music, (C) acoustic signals, (D) handwriting and (E) images.

Area Topics
Speech Speech recognition/synthesis/coding, spoken language processing, spoken dialog understanding
  • Robust speech recognition in adverse conditions (noise, channel, reverberation, etc.)
    • Noise model composition with Lagrange polynomial approximation
    • Jacobian adaptation to noisy environment
    • Reverbarant acoustic modeling
  • Acoustic model structure for speech recognition
    • Likelihood compensation for unexpected noise
    • Asyncronous-transition HMM
    • Linear regression HMM
    • Tree-based clustering for phoneme environments
  • Adaptation to speaker and environment
    • Vector Field Smoothing (VFS) for speaker adaptation
    • Tree-structured speaker modeling
  • Automatic speech signal detection
  • Prosodic features for speech understanding
  • Speech synthesis, singing speech synthesis, speech analysis
Music Music signal/information processing
  • Multipitch detection, signal separation, transform to MIDI format
    • Harmonic clustering for polyphonic audio signals, octave estimation, estimation of number of sources
      [ Details and Demos ]
    • "Specmurt Anasylis" for polyphonic audio signals, iterative estimation
      [ Details and Demos ]
    • CASA (computational auditory scene analysis)
  • Automatic rhythm pattern recognition
    • HMM-based rhythm recognition
    • Tempo analysis based on segmental polynomial modeling
  • Automatic harmonization (chord generation) for given melodies
    • HMM-based harmony generation
  • Automatic harmony analysis
    • Grammar-based approach to harmony analysis
  • Automatic counterpoint
    • Dynamic programming for three(or more)-voice counter point
  • Planed: automatic music composition, sheet music recognition, automatic music rendering
Human Interface Human interface, digital-human systems
  • Anthropomorphic spoken dialog agent (Life-like animated agent, digital human), animated images for spoken dialogue
    • "Galatea" project, Galatea toolkit
  • Dialog description language (VoiceXML) and its interpreter
  • human interface for the handicapped
Signal Processing Acoustic signal processing
  • Microphone array signal processing
    • Complex Spectrum Circle Centroid (CSCC) method for sound source separation
    • Optimal microphone allocation
  • Multichannel Signal coding
    • Tree-structured lossless coding of multichannel signals for MPEG4
  • Room acoustics measurement using M-sequence modulation
Hand-writing Hand-written character recognition
  • Structural approach to hand-written character recognition
  • Online hand-written character recognition using continuous speech recognition algorithms
    • HMM-based stroke modeling and Kanji-structure grammar
  • Rapid adaptation to the writer
    • Maximum A Priori method for adaptation to the writer
  • Mathematical formula recognition (generating TeX codes)

Page created by Shigeki Sagayama