Overview of our IJCAI-07 presentation on "Automatic Decision of Piano Fingering Based on Hidden Markov Models"

Technical Introduction

Automatic Decision of Piano Fingering Based on Hidden Markov Models

Yuichiro Yonebayashi, Hirokazu Kameoka, Shigeki Sagayama
Sagayama / Ono Lab., The University of Tokyo.

1. Overview

Our goal is the automatic determination of the best piano fingering for a given piano score. We also aim at formulating mathematically what the reasonable hand motion for a piano performance is.

First, let us explain where the difficulty lies in fingering decision. the following problems have to be dealt with:

How to select a finger to play a certain key after playing another key?
- the hand motion must be natural or reasonable

How can we include the following facts or rules in a consistent model?
- it is hard to move the 3rd and 4th fingers alternately
- it is often preferable to avoid playing a black key with the 1st finger
- the player cross fingers when needed

Our research addresses these issues.

There are many prospective applications, such as:

Present a model performance to piano learners
Robot piano player
General robot manipulation task
Estimation of required playing skill (applied to music information retrieval)
Playability criteria for automatic music composition and transcription
Extension of the theory to other instruments (guitar, etc.)

2. Why a probabilistic approach?

We take a probabilistic approach. It is because some fingerings are likely to be adopted by a performer while others are not.

Thus, fingering decision essentially involves probabilistic modeling of piano performance.

3. Our approach

In our approach, we assume that performance is generated from fingering.

Thus, fingering decision can be considered as a probabilistic inverse problem which estimates the source (unseen) fingering from the (imaginary) observed performance. One can say that this viewpoint on fingering decision is parallel to the principle of modern speech recognition.

4. Probabilistic Model of Fingering using HMMs

Let us now explain the probabilistic modelling of fingering based on HMMs. This is the most important part of this research.

Piano performance can be considered as the process of producing a sequence of key press moves from a state transition sequence of hands and fingers.

So, given a sequence K of key press moves, we want to find the most likely state transition sequence S of hands and fingers that maximizes:

This formula is obtained by means of Bayes' Theorem. The time information (the parameter t) is included in this formula because it is very important th ensure that the performer can perform a move of key press from one position on the keyboard to another in a specified time, given by the note length in the piano score.

Consequently, we can say that arc-emission HMMs (Mealy machine) fit in very well as a fingering decision model, where:

an HMM state represents the positions and forms of hands and fingers
an Emission associated with an HMM transition corresponds to a move of key press from one position on the keyboard to another.

Thus, fingering decision is formulated as a Viterbi search to find the optimal path of finger state transitions with maximum a posteriori probability.

Here, we will explain the structure of the model parameters, especially the emission probabilities. We use Gaussian mixtures as probability density function because they represent well the geometrical considerations, as illustrated below:

In this figure, three circles represent the Gaussian distribution. The origin of the plane is on the F# key of the keyboard, and the center (mean) of the Gaussian distribution is on A. This figure illustrates that, if F# is played with the 3rd finger (for example), the next key press by the 5th finger (for example) is most likely to be A.

As a first step, we made some approximations on our model:

note length was not considered
equivalence among octave positions of the hands was assumed

5. Experiments

We used piano scores of monophonic melodies in a single hand. Model parameters were tuned manually after an initial setting to intuitive values reflecting:

the physical structure of the human hand
conventions in the actual fingering decision

We obtained reasonable results as follows.

In the first sample, the distance in key pair corresponds to that of finger pair in the most places. In other places, future notes are considered.
In the second sample, finger-crossing is introduced in appropriate places.
In the third sample, playing a black key with the 1st finger is properly avoided.

There are problematic results as well, but we have prospective solutions for them.

In the first sample, different fingers must be used for the same key depending on the note length. This suggests that we have to incorporate note lengths in the fingering model.
In the second sample, stable hand motion is preferred. A possible solution would be to adjust values of the model parameters or to incorporate model parameters representing hand motion.

6. Future work

There are many future works, including:

More accurate models with less approximations
Automatic training of model parameters
Chords and polyphonic cases with both hands
- 10 fingers
- Main idea: consider 1024(= 2¹⁰) states to represent all combinations of 10 fingers, each either pressing a key or not.
- Can be solved similarly to the monophonic case.

Comparative performance evaluations of other approaches and models
Considering expressive piano performances

7. Conclusion

New formulation for fingering decision has been introduced as a probabilistic inverse problem.
Fingering decision is formulated as a Viterbi search to find the optimal path of state transitions along which the a posteriori probability is maximum in the finger-state HMM.
Emission probability is represented by a probability density representing well the geometrical considerations of the problem.
Experimental validation was successfully conducted for monophonic melodies.