Technical Introduction Automatic Decision of Piano Fingering Based on Hidden Markov Models Yuichiro Yonebayashi, Hirokazu Kameoka, Shigeki Sagayama
Our goal is the automatic determination of the best piano fingering for a given piano score. We also aim at formulating mathematically what the reasonable hand motion for a piano performance is.
First, let us explain where the difficulty lies in fingering decision. the following problems have to be dealt with:
Our research addresses these issues. There are many prospective applications, such as:
We take a probabilistic approach. It is because some fingerings are likely to be adopted by a performer while others are not.
Thus, fingering decision essentially involves probabilistic modeling of piano performance.
In our approach, we assume that performance is generated from fingering.
Thus, fingering decision can be considered as a probabilistic inverse problem which estimates the source (unseen) fingering from the (imaginary) observed performance. One can say that this viewpoint on fingering decision is parallel to the principle of modern speech recognition.
Let us now explain the probabilistic modelling of fingering based on HMMs. This is the most important part of this research. Piano performance can be considered as the process of producing a sequence of key press moves from a state transition sequence of hands and fingers.
So, given a sequence K of key press moves, we want to find the most likely state transition sequence S of hands and fingers that maximizes:
This formula is obtained by means of Bayes' Theorem. The time information (the parameter t) is included in this formula because it is very important th ensure that the performer can perform a move of key press from one position on the keyboard to another in a specified time, given by the note length in the piano score. Consequently, we can say that arc-emission HMMs (Mealy machine) fit in very well as a fingering decision model, where:
Thus, fingering decision is formulated as a Viterbi search to find the optimal path of finger state transitions with maximum a posteriori probability. Here, we will explain the structure of the model parameters, especially the emission probabilities. We use Gaussian mixtures as probability density function because they represent well the geometrical considerations, as illustrated below:
In this figure, three circles represent the Gaussian distribution. The origin of the plane is on the F# key of the keyboard, and the center (mean) of the Gaussian distribution is on A. This figure illustrates that, if F# is played with the 3rd finger (for example), the next key press by the 5th finger (for example) is most likely to be A. As a first step, we made some approximations on our model:
We used piano scores of monophonic melodies in a single hand. Model parameters were tuned manually after an initial setting to intuitive values reflecting:
We obtained reasonable results as follows.
In the first sample, the distance in key pair corresponds to that of
finger pair in the most places. In other places, future notes are
considered. There are problematic results as well, but we have prospective solutions for them.
In the first
sample, different fingers must be used for the same key depending
on the note length. This suggests that we have to incorporate note
lengths in the fingering model.
There are many future works, including:
|