The observed duration (IOI) [sec] of note in the performed MIDI
data is related both to intended note value [beats] in the score and
to tempo variable [sec/beat] (average time per beat) by:
Rhythm recognition can be defined as a decomposition of IOIs into tempo ,, and rhythm ,,. This is a kind of ill-posed problem since and are not determined uniquely. In principle, any rhythm can be expressed in various ways, e.g. twice note values and half tempo gives the same note duration in Eq. 1. Furthermore, fluctuation of tempo and rhythm can not be completely separated. Decomposition is possible only in a probabilistic sense assuming that is constant or slowly changing (at least within phrases), and that often fit common rhythm patterns. Human often can recognize rhythm from the musical performance because they have a priori knowledge of rhythm, e.g., what type of rhythm patterns are likely to appear. In our approach, ``most likely rhythm patterns'' for the given MIDI data are estimated by search in the proposed probabilistic models, whose parameters are optimized by stochastic training with existing scores and performances.
Our goal is to separate rhythm and tempo by iterating the estimation
of the two. First, we estimate rhythm from the IOIs of the given MIDI
using tempo-invariant feature parameters. Then, using the estimated
rhythm and the given IOIs, the tempo is estimated. Rhythm and tempo are
alternately re-estimated using the estimated counterpart. In the next
sections, we discuss first two steps.