Lloyd A. Smith and Rodger J. McNab, Dept. of Computer Science, University of Waikato
At TDME96, a software program was described
that is designed to provide feedback to students practicing
sight-singing. This paper discusses the current generation of the
program, which has been extended in several ways. The original
program provided a fixed database of 100 chorale melodies for
sight-singing practice, and there was no way to select a phrase
except by specifying the number of the chorale. The current version
combines several databases, described below, to provide 9400 tunes;
these databases may be searched to find tunes containing particular
patterns for sight-singing practice, and the test melody may be
transposed to a comfortable key for the singer. In addition, the
current program allows a teacher or student to add tunes (or
exercises) to the system. Furthermore, the original program
restricted the user to sing only one phrase at a time from the chosen
chorale; the current version allows the user to choose one or more
phrases from a given tune. If the tune is too long to fit within the
window, the system smoothly scrolls the melody in order to keep the
singer from reaching the end of the window.
The program runs on a Power Macintosh, making use
of its built-in sound 1/0 functions no special purpose hardware is
required by the system. Pitches are identified with a resolution of
four cents (0.23%), and the system is designed to accept a range from
F2 (87.31 Hz), just below the bass staff, to G5 (784 Hz), just above
the treble staff. In order to allow the system to reliably detect
note boundaries, the user is asked to sing on the syllable da
or ta-the consonant provides a drop in amplitude which is
detected by adaptive thresholds.
Figure I is a screen display illustrating the use
of the sight-singing tutor. The figure shows four windows used by the
program-the music window, the database window, the
tempo window and the record window. The music window displays
the test melody (on the top staff) and the transcription of the
user's singing (on the bottom staff). In the figure, the user has
chosen to sing the first phrase of chorale no. 158, and has received
a score of 92, shown in the lower left of the window. The lines
connecting notes from the test melody to the transcription show how
the two sequences of notes were aligned by the scoring algorithm. The
database window displays the titles of melodies, allowing the user to
choose a melody and a phrase or phrases from that melody. The tempo
window enables a user to set the system tempo and, by depressing the
test button with the mouse, to hear a metronome click at that
tempo. Finally, the record window allows the user to record sound
into the system by depressing the record button. The system
records sound as long as the record button is down, stopping when the
user releases the mouse button. In addition, the record window allows
the user to hear the key note by depressing the button with the
musical note icon.
Figure 1. Display from the sight-singing
tutor
There are currently three major databases
available to users of the sight-singing tutor-the Essen database
(Schaffrath, 1992), comprised of about 6000 German and Eastern
European folk tunes, 2200 Chinese tunes and several hundred Irish
melodies, the Digital Tradition database (Greenhaus, 1994) of 1700
folk tunes, mostly of North American origin, and the original
database of 100 chorale melodies. At any time, one or more databases
may be active. This provides a total of ten thousand melodies
available for sight-singing practice.
Figure 2 illustrates the selection of a
melody--only one melody may be selected for display at any given
time, but multiple phrases from that melody may be selected. Phrase
selection relies on phrase identification in the database, however,
and that information is not available in the Digital Tradition
database; in other words, melodies from the Digital Tradition are all
considered by the system to contain one phrase encompassing the
entire melody. Phrases are identified in the Essen and chorale
databases.

With such a large collection of tunes, a major
issue is how a student or teacher might find tunes which exercise the
singing of particular musical patterns, such as syncopations or
tritones. The search mechanism embedded in the sight-singing tutor is
primitive but functional-the user sings a phrase which contains the
pattern of interest, then selects the pattern by clicking and
dragging the mouse over the notes making up the pattern. Selecting
Find Selection from the File menu then displays a
window such as the one shown in Figure 3, which illustrates the
result of conducting a search for melodies containing at least 25
minor thirds. The search may be based on rhythm only, on pitch only,
or on both. In the case illustrated by Figure 3, the search was
conducted on pitch, and the system found 27 melodies containing the
required number of minor thirds. Depressing Find First
displays the first matching tune in the music window, after which
the window disappears; Find Next, from the File menu
can then be used to step through the melodies one by one.

Figure 3. Search dialogue
Tunes are added in a similar manner---the
student or teacher sings a melody, then adds it to the system as a
new database. In fact, any transcribed melody may be saved, with the
system allowing the user to specify title, time signature and whether
the key is major or minor. The tune may be recorded and transcribed
whether or not there is a test melody displayed at the time of
recording; if a test melody is displayed, then the recorded tune is
transcribed in the same key as the test melody. If a test melody is
not displayed (i.e., the top staff in the music window has no notes),
then the system transcribes the recorded tune in the key which
accurately displays the melody using the fewest accidentals.
Because search patterns and added tunes are
entered by recording acoustic input, it is necessary to provide some
way to correct singing and transcription errors. This is done by
double clicking on a note, which brings up an edit dialogue box,
shown in Figure 4. The pitch is entered as a MIDI number
corresponding to the correct pitch, while the rhythm is entered as a
duration in number of sixteenth notes. In the figure, a dotted
quarter note, with a duration of six sixteenth notes, has been
selected for editing; the pitch is correct, but the duration will be
changed to eight sixteenths, representing a half note.
When a melody is recorded, the system matches it
against the test melody using the musical pattern matching algorithm
of Mongeau and Sankoff (1990). This algorithm returns a distance
between two sequences of notes--O is a perfect match, while
higher scores indicate a greater degree of difference between
sequences. The same matching algorithm was used in the original
version of the program, but the score has now been converted to a
more intuitive form by subtracting the distance from 100.

The system is still in a prototype stage, and there are many ways to improve it-by providing, for example, a more convenient way of adding new melodies. We feel, however, that the program is useful in its present form, and that further development should be guided by music educators. The sight singing tutor can be downloaded from http://cs.waikato.ac.nz/~rjmcnab; the program requires a Power Macintosh running System 7 and an external microphone. The same site also holds several papers that describe the melody transcription system and its applications.
References
Greenhaus, D. (1994). About the digital tradition. (Online.]
Available at http://www.deltablues.com/DigiTrad-blurb.html
Schaffrath, H. (1992). The ESAC databases and MAPPET software. In W. Hewlett and E. Selfridge-Field (Eds.), Computing in Musicology, Vol. 8. Menlo Park: Center for Computer Assisted Research in the Humanities.
Mongeau, M., & Sankoff, D. (1990). Comparison of musical
sequences, Computers and the Humanities, 24,
161-175.