Improvements to a Sight-Singing Tutor

Lloyd A. Smith and Rodger J. McNab, Dept. of Computer Science, University of Waikato

 

    At TDME96, a software program was described that is designed to provide feedback to students practicing sight-singing. This paper discusses the current generation of the program, which has been extended in several ways. The original program provided a fixed database of 100 chorale melodies for sight-singing practice, and there was no way to select a phrase except by specifying the number of the chorale. The current version combines several databases, described below, to provide 9400 tunes; these databases may be searched to find tunes containing particular patterns for sight-singing practice, and the test melody may be transposed to a comfortable key for the singer. In addition, the current program allows a teacher or student to add tunes (or exercises) to the system. Furthermore, the original program restricted the user to sing only one phrase at a time from the chosen chorale; the current version allows the user to choose one or more phrases from a given tune. If the tune is too long to fit within the window, the system smoothly scrolls the melody in order to keep the singer from reaching the end of the window.
    The program runs on a Power Macintosh, making use of its built-in sound 1/0 functions no special purpose hardware is required by the system. Pitches are identified with a resolution of four cents (0.23%), and the system is designed to accept a range from F2 (87.31 Hz), just below the bass staff, to G5 (784 Hz), just above the treble staff. In order to allow the system to reliably detect note boundaries, the user is asked to sing on the syllable da or ta-the consonant provides a drop in amplitude which is detected by adaptive thresholds.
    Figure I is a screen display illustrating the use of the sight-singing tutor. The figure shows four windows used by the program-the music window, the database window, the tempo window and the record window. The music window displays the test melody (on the top staff) and the transcription of the user's singing (on the bottom staff). In the figure, the user has chosen to sing the first phrase of chorale no. 158, and has received a score of 92, shown in the lower left of the window. The lines connecting notes from the test melody to the transcription show how the two sequences of notes were aligned by the scoring algorithm. The database window displays the titles of melodies, allowing the user to choose a melody and a phrase or phrases from that melody. The tempo window enables a user to set the system tempo and, by depressing the test button with the mouse, to hear a metronome click at that tempo. Finally, the record window allows the user to record sound into the system by depressing the record button. The system records sound as long as the record button is down, stopping when the user releases the mouse button. In addition, the record window allows the user to hear the key note by depressing the button with the musical note icon.

 

Figure 1. Display from the sight-singing tutor

    There are currently three major databases available to users of the sight-singing tutor-the Essen database (Schaffrath, 1992), comprised of about 6000 German and Eastern European folk tunes, 2200 Chinese tunes and several hundred Irish melodies, the Digital Tradition database (Greenhaus, 1994) of 1700 folk tunes, mostly of North American origin, and the original database of 100 chorale melodies. At any time, one or more databases may be active. This provides a total of ten thousand melodies available for sight-singing practice.
    Figure 2 illustrates the selection of a melody--only one melody may be selected for display at any given time, but multiple phrases from that melody may be selected. Phrase selection relies on phrase identification in the database, however, and that information is not available in the Digital Tradition database; in other words, melodies from the Digital Tradition are all considered by the system to contain one phrase encompassing the entire melody. Phrases are identified in the Essen and chorale databases.


Figure 2. Selecting a test melody

    With such a large collection of tunes, a major issue is how a student or teacher might find tunes which exercise the singing of particular musical patterns, such as syncopations or tritones. The search mechanism embedded in the sight-singing tutor is primitive but functional-the user sings a phrase which contains the pattern of interest, then selects the pattern by clicking and dragging the mouse over the notes making up the pattern. Selecting Find Selection from the File menu then displays a window such as the one shown in Figure 3, which illustrates the result of conducting a search for melodies containing at least 25 minor thirds. The search may be based on rhythm only, on pitch only, or on both. In the case illustrated by Figure 3, the search was conducted on pitch, and the system found 27 melodies containing the required number of minor thirds. Depressing Find First displays the first matching tune in the music window, after which the window disappears; Find Next, from the File menu can then be used to step through the melodies one by one.

Figure 3. Search dialogue

    Tunes are added in a similar manner---the student or teacher sings a melody, then adds it to the system as a new database. In fact, any transcribed melody may be saved, with the system allowing the user to specify title, time signature and whether the key is major or minor. The tune may be recorded and transcribed whether or not there is a test melody displayed at the time of recording; if a test melody is displayed, then the recorded tune is transcribed in the same key as the test melody. If a test melody is not displayed (i.e., the top staff in the music window has no notes), then the system transcribes the recorded tune in the key which accurately displays the melody using the fewest accidentals.
    Because search patterns and added tunes are entered by recording acoustic input, it is necessary to provide some way to correct singing and transcription errors. This is done by double clicking on a note, which brings up an edit dialogue box, shown in Figure 4. The pitch is entered as a MIDI number corresponding to the correct pitch, while the rhythm is entered as a duration in number of sixteenth notes. In the figure, a dotted quarter note, with a duration of six sixteenth notes, has been selected for editing; the pitch is correct, but the duration will be changed to eight sixteenths, representing a half note.
    When a melody is recorded, the system matches it against the test melody using the musical pattern matching algorithm of Mongeau and Sankoff (1990). This algorithm returns a distance between two sequences of notes--O is a perfect match, while higher scores indicate a greater degree of difference between sequences. The same matching algorithm was used in the original version of the program, but the score has now been converted to a more intuitive form by subtracting the distance from 100.


Figure 4. Correcting a transcribed note

    The system is still in a prototype stage, and there are many ways to improve it-by providing, for example, a more convenient way of adding new melodies. We feel, however, that the program is useful in its present form, and that further development should be guided by music educators. The sight singing tutor can be downloaded from http://cs.waikato.ac.nz/~rjmcnab; the program requires a Power Macintosh running System 7 and an external microphone. The same site also holds several papers that describe the melody transcription system and its applications.

References
Greenhaus, D. (1994). About the digital tradition. (Online.] Available at http://www.deltablues.com/DigiTrad-blurb.html

Schaffrath, H. (1992). The ESAC databases and MAPPET software. In W. Hewlett and E. Selfridge-Field (Eds.), Computing in Musicology, Vol. 8. Menlo Park: Center for Computer Assisted Research in the Humanities.

Mongeau, M., & Sankoff, D. (1990). Comparison of musical sequences, Computers and the Humanities, 24,
161-175.