"Harmonic Clustering"
Statistical-Learning-Based Multi-Pitch Analysis

by Hirokazu Kameoka and Shigeki Sagayama, The University of Tokyo.

  Harmonic Clustering - Tied Gaussian Mixture Spectral Representation
"Harmonic Clustering, Tied Gaussian Mixture Spectral Representation" is a novel approach for separating complex mixture of multiple tone signals from a single channel input. When a power spectrum at a single frame is given, this method tries to decompose frequency axis into several striped-territories each of which cover all prospective partial components generated from a particular sound source. The beginning of this idea was based on yet another clustering principle "constrained fuzzy k-means algorithm". Theoretical relation of the original clustering form and the current particular form, tied-GMM(Gaussian Mixture Model)-based spectral optimum approximation formulation, was discovered in 2003 (Although a similar idea was already proposed by M. Goto in 1999). A further interpretation: "EM(Expectation Maximization)-based GMM fitting can be used as an effective front-end followed by some other simple iteration like hill-climbing for Gaussian kernel regression analysis", allowed the use of Akaike's Information Criterion (AIC), Bayesian Information Criterion (BIC), etc., for robust estimation of the number of concurrent sounds and the pitch 'octave' positions. The specific characteristics of this method in the current stage are summerized as follows: The algorithm (1) ouputs accurate pitch estimates, (2) does not need prior assumption on the number of concurrent sounds, and (3) tries to avoid double/half pitch errors.

Linear-frequency-scaled "tied"-GMM Log-frequency-scaled "tied"-GMM Given spectrum of 3 sounds (F0: 370,440,556Hz)
Select the number of pitches giving
a minimum in AIC
Harmonic spactral structure modeled as weighted sum of tied-Gaussians
whose maxima are centered over prospective harmonics
See one of the pitch estimates that dropped into half pitch error. An essentially same iteration finding true pitch positions is then performed after we know how many pitches in the given spectrum.

Keywords: multipitch, overtones, harmonics, automatic transcription, multitone analysis, fundamental frequencies, spectrum analysis

  Bibliography
This idea and preliminary results were first published in Japanese [Kameoka2003ASJ03]. AIC-based estimation of the number of sounds and the pitch 'octave' positions was first included in [Kameoka2003MUS08] (real music performance data for test data) in Japanese and [Kameoka2004SWIM01] (synthesized concurrent speech signal for test data) and [Kameoka2004ICASSP05] (real music performance data) in English.
  • [Kameoka2004SWIM01]
    Hirokazu Kameoka, Takuya Nishimoto and Shigeki Sagayama, ``Accurate F0Detection Algorithm for Concurrent Sounds Based on EM Algorithm and Information Criterion' Proc.Special Workshop in Maui(SWIM) Maui,USA,in CD-rom.Jan.2004 [PDF file]

  • [Kameoka2004ICASSP05]
    Hirokazu Kameoka, Takuya Nishimoto and Shigeki Sagayama, ``Separation of Harmonic Structures Based on Tied Gaussian Mixture Model and Information Criterion for Concurrent Sounds,'' Proc.IEE,International Conference on Acoustics, Speech and Signal Processing(ICASSP04) (Montreal,Canada) 2004. [PDF file]

  • [Kameoka2003ASJ03]
    Hirokazu Kameoka, Takuya Nishimoto and Shigeki Sagayama, ``Multipitch Estimation using Harmonic Clustering,'' The 2003 Spring meeting of the Acoustic Society of Japan, 3-7-3,pp.837-838 Mar 2003 (in Japanese). [PDF file)]

  • [Kameoka2003MUS08]
    Hirokazu Kameoka, Takuya Nishimoto and Shigeki Sagayama, ``Estimation of Number of Sound Sources and Octave Position in Multi-Pitch Extraction Using Harmonic Clustering,'' Technical Report of IPSJ, 2003-MUS-51, pp.29-34, Aug. 2003 (in Japanese) [PDF file]
    Awarded "The Best Presentation Award".

  A Sample of pitch estimation results of real music performace data
S. Yamamoto: "Crescent Serenade (Guitar Solo)" excerpted from the RWC Music Database.
Click here to enlarge the images.

original spectrum pitch estimation result Handicrafted MIDI data for reference

  Preliminary results of MIDI conversion from the pitch estimation result
  • HMM-based pitch contour interpolation was used as a back-end processing for the purpose of converting pitch estimation results into MIDI. Note that this back-end part has several parameters to be tuned, which has not yet been adjusted.
  • The test data are excerpted from RWC music database.
  • The academic use of these sample data is granted if it is associated with a notification of "These data were prepared by H. Kameoka on June 7, 2004 for academic use".
  • Click the music title to listen to the original audio data; click the rightmost MIDI sound files to download or listen to the MIDI-converted results.

Title Composer/arrangerGenreInstrument MIDI conversion
pitch accuracy(%)
MIDI sound#
Nocturne No.2 in E flat, op.9-2 F. ChopinClassic Piano79.2 Listen, Download MIDI
`Traumerei' from Suite Kindeerszenen, op. 15 R. A. SchumannClassic Piano77.6 Listen, Download MIDI
6 Sonatas and Partitas for Unaccompanied
Violin no.6 (Partita no.3) in E major,
BWV.1006.3 Gavotte en Rodeau
J. S. BachClassic Violin81.7 Listen, Download MIDI
For two (Guitar solo) H. ChubachiJazz Guitar79.8 Listen, Download MIDI
Jive (Piano solo) M. NakamuraJazz Piano 72.5 Listen, Download MIDI
Jive (Guitar solo) H. ChubachiJazz Guitar - Listen, Download MIDI
Crescent Serenade (Guitar solo) S. YamamotoJazz Guitar85.3 Listen, Download MIDI

[ Back to Lab Home ]