CAI for Harmony and Ear Training using Octave-Complex (Shepard) Tones, Pitch-Class Sets, and Perceptual Theory

Richard Parncutt

Faculty of Music, McGill University

 

A computer-based teaching method integrating harmony and ear training is proposed. The method is inspired by experimental paradigms commonly used in psychomusicology. Chords are initially composed only of octave-complex tones (Shepard, 1964). This eliminates timbral variations due to choice of musical instrument and voicing (inversion, spacing, doubling), thereby encouraging spontaneous holistic recognition of harmonic intervals and chord types. Only later are chords of regular musical (harmonic-complex) tones introduced. Material is not restricted to diatonic patterns, but instead enumerates all possible pitch-class sets within given restrictions at each stage of the course. The method attempts to explain the prevalence of certain pitch structures in tonal music according to perceptual theories of pitch and consonance.

Introduction

Musicianship students at secondary and undergraduate levels typically must learn to recognize a wide range of musical sonorities, including harmonic intervals and their compounds and inversions, and triads and tetrads in various voicings (inversions, spacings, doublings). The problem is exacerbated by the wide range of instrumental timbres that may be used to realize these sonorities. Multiplying these factors together yields an enormous number of perceptually distinguishable timbres. How can a student learn to recognize the individual elements of such a large set?

The traditional solution to this problem is to provide students with analytic skills, such as counting tones and semitones within chords, in order to figure out their pitch structure. In this way, students can often arrive at the correct answers, albeit with considerable head-scratching, humming, and do-re-mi-ing.

It is well-known in psychology that the number of "objects" that can be stored simultaneously in human working memory is quite limited (Miller, 1956). With this in mind, a major challenge in ear-training pedagogy is to reduce to a manageable level the number of different sonorities that students are required to recognize in a given pedagogical context. At the same time, their tonal vocabulary should remain broad enough that it is musically realistic. What is the optimal compromise between these criteria?

The present paper attempts to solve the problem by changing the acoustical make-up of the sonorities — a prospect that has only become feasible with the advent of CAI. In the early stages of the proposed method, intervals and chords are composed of octave-complex tones (often called Shepard tones). An octave-complex tone is a complex tone whose pure-tone components are equally spaced at octave intervals over most of the audible frequency spectrum. For example, an octave-complex tone on E may contain spectral frequencies corresponding to E3, E4, E5, E6, and E7.

Early CAI software was hampered by the poor sound quality of tones produced by analog synthesis. Such software was often unsuitable for musicians and music students striving to improve their ability to hear tonal relationships in real music. The introduction of digital samplers and MIDI has brought with it a marked improvement of sound quality in musicianship CAI. The importance of realistic, high-fidelity musical sound in ear-training software is becoming increasingly evident.

The software to be described in the present paper returns to the use of musically "unnatural" tones — but for a new reason. Previously, "unnatural" tones were used because better timbres were not available. In the present method, "unnatural" tones (octave-complex tones) are chosen for their unique perceptual properties.

First, octave-complex tones correspond directly to pitch-classes (pcs) or chroma. Second, octave-complex tones with an invariant spectral amplitude envelope (as in Shepard's 1964 illusion of a perpetually rising scale) differ minimally with respect to pitch height and timbre. The same applies to harmonic intervals or chords of octave-complex tones: Transposition around the chroma cycle produces minimal variation in overall timbre and pitch-height. Third, a chord of octave-complex tones has only one voicing; concepts of inversion, spacing, and doubling do not apply to such chords.

A one-to-one correspondence between tone-sets and timbres drastically reduces the number of timbres to which students are exposed in their ear-training exercises, making it feasible for them to recognize chord types (such as major triad, half-diminished seventh...) from their timbre alone, without counting tones and semitones. Reducing the timbral variety of chords may be said to encourage spontaneous holistic recognition of harmonic interval and chord types.

In the proposed method, the reduction of timbral variety is not achieved at the expense of harmonic variety. On the contrary, harmonic variety may be enhanced: The use of octave-complex tones enables all possible pc-sets to be presented to students in one and the same ear-training sequence. An encouraging spin-off is the bridge that this creates between tonal and atonal musicianship and theory. The method tends to remove or weaken the boundary between these two areas of musicianship courses. The conventional difference between tonal and atonal materials (in both harmony and ear-training) is suppressed from the beginning, streamlining the transition from tonal to atonal at more advanced levels.

Sequential structure of the method

The method is divided into units. As a rule, each unit is a prerequisite for the next; however, some variation in the order of the units is possible. Each unit begins with aural demonstration of the materials to be tackled in that unit. The materials are then incorporated into interactive ear-training routines. Once the pc-material in each unit has been mastered by the student, a transition is made to regular musical (harmonic-complex) tones. At this point, real musical examples, both seen and heard, may be introduced. Ideally, these examples should be selected from a wide range of musical styles, including "art" and "popular" literature, and students should be given the opportunity to choose a musical style that they know and enjoy as a basis for the examples. Each unit also includes a theoretical part.

Conventional harmony and ear-training methods in tonal Western music normally begin with a study of melodies and chords in tonal (major-minor) contexts. The present method differs in that sonorites are initially presented in isolation. The method proceeds to show how a tonal framework may be created by using certain sonorities in certain ways (see units 6 to 8 of the present version). The rationale behind this approach is to demonstrate that each sonority may be incorporated into a wide range of tonal and "pan-tonal" systems, including harmonic systems in some non-Western musics. The approach prepares students for a broad-minded and creative approach to harmony theory.

 

The individual units of the method are as follows.

(1) Two basic axioms of the Western tonal system are presented. First, all tones are assumed to belong to the (approximately equally-tempered) chromatic scale. Second, tones an octave apart are considered to be harmonically equivalent. Possible perceptual underpinnings of these two axioms are discussed.

(2) All six interval classes (ics) are presented, and interactive software teaches students to recognize ics when presented both melodically and harmonically. Additional variation is introduced either by anchoring one of the tones to a given pitch, or by making random transpositions around the chroma cycle. Students are encouraged to recognize intervals spontaneously according to their quality, timbre, or consonance/dissonance. In a theoretical diversion, the principal sensory basis of consonance/dissonance (roughness) is explained according to the theory of Plomp and Levelt (1965). Once ics can be recognized, students hear examples of intervals between ordinary complex (musical) tones, along with references from the literature, and learn to distinguish specific intervals from their inversions and compounds.

(3) All 19 triad types are presented. Triad types are sets of three different pcs that are invariant under transposition but not inversion (cf. Forte, 1973). The more familiar triad types have names: major (047), minor (037), diminished, (036), augmented (048), and suspended-fourth (057). Others may be recognized as parts of dominant (major-minor) seventh chords (035, 026), major sixth chords (025), and so on.

The triads may be labeled in the software interface in three different ways. The first kind of label is the chromatic name, a list of the number of semitones between each note and a lower reference pitch. Examples of chromatic names are 037 (minor triad) and 035 (the fifth, seventh, and octave of a major-minor seventh chord). The second kind of label is the diatonic name (e.g., the parentheses in the previous sentence). Not all triad types have diatonic names. The third kind of label is the nickname. Students may creatively invent nicknames for sonorities and type them into the computer. Labels should be chosen that help chords to be quickly recognized. For example, a student might label the major triad "happy", the diminished "dramatic", the 012 triad a "severe squash", and the 015 triad "Burt" (because it reminds of a Burt Bacharach number). If no creative label comes to mind for a given sonority, then a list of possible labels (including labels used by past students) is presented. Once a student has chosen a label for a chord type, and learned to apply that label consistently, then it is a relatively easy matter for that student to learn to "translate" the label into its corresponding diatonic or chromatic version.

The chromatic name of a given chord type may be written in various different ways. Forte (1973) labeled the 12 transpositionally- and inversionally-invariant trichords according to the prime forms of their pc-sets. For example, he called all major and minor triads 0,3,7. In the present application, Forte’s prime forms are not appropriate, as inversions of pc-sets need to be distinguished. For example, students clearly need to be able to distinguish the major triad 047 from its chromatic inversion, the minor triad 037. Here, the chromatic labels of the 19 triad types are written 012, 013, 023, 014, 034, 015, 045, 016, 056, 024, 025, 035, 026, 046, 027, 036, 037, 047, 048.

Students must of course be introduced gradually to this set. The software allows for any subset of the 19 triad types to be selected for testing. The presentation procedure consists of a sequence of trials that continues indefinitely, until interrupted by the student. In each trial, a triad type is selected at random from the chosen set. The triad type is then presented, either relative to a fixed pitch-class (e.g., all chords may be built on C), or in a random transposition. The student may have the triad repeated any number of times before responding. The student then attempts to identify it. Various kinds of feedback may be given, e.g., "wrong, try again" or "no, it was 057."

All triad types may be heard either melodically and harmonically. If the emphasis is on spontaneous recognition of chord timbres, then harmonic presentations should precede melodic — exactly the reverse of the conventional procedure!

Various options are provided that may help students to learn to identify triad types. Students are encouraged to explore the available options, and to take advantage of those that will help them to achieve fluency of chord recognition in the shortest time or with the least effort.

One of these options involves the sensory dissonance or roughness of each harmonic triad. First, each sonority in a chosen set is presented several times in a random sequence, and students are asked to rate the roughness of each sonority on a 7-point scale (1 = very smooth, 7 = very rough). The mean and standard deviation of roughness estimates of each chord in the set, are then calculated and shown to the student. Using this information, the student now attempts to identify the chords according to their roughness. Initially, this method works best for chords with clearly different degrees of roughness, such as the set 037, 024, and 013. Later, the method may be used to distinguish chords whose roughness varies to a smaller degree.

A helpful option at this point is a theoretical account of roughness perception, which may be presented either on the computer screen or in class (preferably, both). Plomp and Levelt (1965) measured the sensory dissonance of a simultaneous pair of pure tones. Perceived roughness was maximal when the interval between the two was about one-quarter of a critical bandwidth. Applying this theory to octave-complex tones, maximum roughness occurs at the interval of 1 semitone, and roughness decreases steadily as interval size increases to its theoretic maximum value of 6 semitones (since an interval of 7 semitones between octave-complex tones is equivalent to an interval of 5 semitones). The sensory dissonance of chords of three or more octave-complex tones may be calculated by extension of Plomp and Levelt’s model (Hutchinson & Knopoff, 1978).

In regular music theory, of course, the interval of 6 semitones is more consonant than intervals of 5 or 7 semitones. The dissonance of the tritone between regular harmonic-complex tones is partly due to roughness evoked by beating between the third harmonic of the lower tone and the second harmonic of the upper tone. No such effect occurs with octave-complex tones, as they contain no third harmonic. Another perceptual source of dissonance in the tritone interval is its root ambiguity; the concept of root ambiguity will be discussed below.

An additional option that may help students to identify chord types involves chord-roots. In this option, each trial consists of a chord (selected at random from a given subset of the 19 triad types) followed by a single octave-complex tone corresponding to one of the notes. The student then responds to the question "Is the note the root of the chord?" or "How well do you think the note would function as the root of the chord?" After a sequence of trials, feedback is given on the number of times each note was selected as the root. The student then strives for consistency in choice of the root, or in ratings of root strength. The feedback may also specify the conventional root of the chord (if any); however, it is not necessary for the student to comply with that convention. The ultimate aim is not to find the "real" root, but to recognize the chord quality. The idea is that once the student can consistently locate a reference pitch in a given chord type, then it will be easier to hear the chord’s quality relative to that pitch.

The chord-root option is supported by a theoretic account of chord roots based on the idea of root-support intervals (0, 7, 4, 10, 2 semitones; Terhardt, 1982; Parncutt, 1988). The model predicts the root(s) and scale(s) implied by each triad type, where both roots and scale steps are understood as pitch-classes (rather than specific pitches). The model calculates a "salience" value for each of the 12 pcs, and takes the pc with the highest salience as the most likely candidate for the root of the chord. Pitch-classes with saliences above an arbitrary cut-off value form the scale implied by that chord. Note that both the root of a chord and the scale that it implies may be affected by the successive context in which the chord appears; here, chords are assumed to be presented in isolation.

The model additionally calculates the root ambiguity of each triad, regarded as an additional measure of its dissonance. For example, the root ambiguity of a diminished-seventh chord is much greater than that of a major triad (see Parncutt, 1988 for examples). Combining roughness with root ambiguity allows a new rank ordering of the triads to be made, that accounts for both variables. The most consonant triad of all is of course the major triad, closely followed by the minor. Students who are more sensitive to roughness than to root ambiguity will rate the suspended-fourth triad more dissonant than the augmented triad, while students who are more sensitive to root ambiguity will do the opposite. The software allows for this possibility via a parameter representing the relative contributions of roughness and root ambiguity to overall dissonance.

Interactive software allows students to hear the pitch relationships in the above theory in a systematic and enjoyable way. Students thus come to understand the theory both aurally and intellectually. Integration of theory and practice at this level is expected to yield to faster progress than pedagogical methods in which theory and practice are separate.

Once students have attained a reasonable degree of proficiency recognizing triads of octave-complex tones, chords from ordinary harmonic-complex tones are presented, along with exercises on recognizing inversions of chords. Rules of voicing (inversion, spacing, doubling) are illustrated with reference to examples from both "art" and "popular" literature. (Many software applications already exist in this area, so I will not go into this part of the method in detail here.)

(4) In the fourth unit, tetrads (including seventh chords) are presented in a way exactly analogous to triads. Note that this unit may be delayed until later, possibly even after unit 8; in general, the order of the units is intended to be flexible, students choosing the path that they feel most comfortable with.

(5) In the fifth unit, students first rate the strength of tonal relationship between successive pairs of chords, in much the same way that they rated roughness and root ambiguity in unit (3). Initially, exercises are limited to the most consonant chord types (major and minor triads); later, more dissonant varieties can be introduced (diminished, major-minor seventh, and so on). Students are encouraged not only to make a global rating of tonal relatedness (compare experiments reported by Krumhansl, 1990), but also to separate that rating into two components, called harmonic and melodic relationship. Harmonic relationship is illustrated acoustically with extreme examples, such as C major versus A minor or G major (examples of close harmonic relationship) and C major versus D# minor (a weak harmonic relationship). In the beginning, the student is not informed what the specific chords are, but is instead encouraged to focus on the strength of the perceived relationship. Melodic relationship is illustrated with chords in which the most voices remain fixed (strong melodic relationship) and in which all voices move by relatively large intervals (weak melodic relationship). Again, students strive to produce consistent ratings of harmonic and melodic relationship. When they can, they then use these parameters to help them to identify specific pitch relationships between chords. Specific relationships may be labeled either diatonically or chromatically, or given nicknames, as before.

In Parncutt (1989), I modeled melodic relationship by the overall pitch distance between two sonorities, and harmonic relationship by the degree to which two sonorities have perceived pitches in common. In both cases, all pitches perceived in each sonority were taken into account, including those pitches that are implied but not actually played or notated. Both formulations also accounted for pitch salience (here: pitch-class salience). In the present application, pitch commonality may be calculated as a 12-element correlation coefficient between the array of pc-saliences of the first chord and that of the second, by analogy to Krumhansl & Kessler’s (1982) derivation of the perceived distances between major and minor keys.

The mean tonal relatedness ratings for each pair of sonorities given by a given student may be compared with calculated pitch commonality (harmonic relationship) and pitch distance (melodic relationship) values, and the results displayed as feedback. Students can then use this information to help them to spontaneously identify specific chord progressions, including the standard cadence types — just as they used sensory dissonance and root ambiguity to help them to recognize individual chord qualities. In this way, students are introduced via their own perception to the concepts of harmonic relationship and voice-leading.

When students gain a certain degree of facility with chords of octave-complex tones, regular musical (harmonic-complex) tones are introduced, as before. The effect of pitch distance is typically much more pronounced between regular musical chords than between chords of octave-complex tones.

The remaining units of the method concern tonality. The interactive CAI techniques used therein are similar in many ways to those described above. I will therefore give no more than a brief description of the accompanying theory.

(6) Scale degrees differ in their perceptual importance or salience (Krumhansl & Kessler, 1982). The relative salience of the scale degrees implied by a chord progression may be predicted by linear combination of the relative saliences of the scale degrees implied by the individual chords in the progression (Parncutt, 1989; Parncutt & Bregman, in preparation). The profile of salience values of each individual chord is called its pc-salience profile, and the overall profile produced by a chord progression is called the aggregate pc-salience profile of the progression. The sixth unit begins by demonstrating these relationships aurally.

(7) The key profiles of Krumhansl and Kessler (1982) may be introduced to students by presenting scales and chord progressions followed by single octave-complex probe tones, allowing students to rate how well the probe tones go with or follow the preceding sonorities, and giving feedback on their average ratings.

The profiles may be derived systematically from the following assumptions, all of which can be demonstrated aurally using CAI.

(a) The tonic is the root of a consonant triad. The tonic triad acts as a reference point for the perception of other triads.

(b) The prevalence (frequency of occurrence) of a given chord in a given tonality depends on (i) its consonance (roughness, root ambiguity) and (ii) its harmonic relationship (pitch commonality) to the tonic triad.

(c) The scale (aggregate pc-salience profile) implied by a sequence of chords is a combination of the scales (pc-salience profiles) implied by the individual chords, as in (6) above.

(8) The finality of a cadence of consonant chords corresponds to the correlation coefficient between its aggregate pc-salience profile and the closest key profile according to Krumhansl and Kessler. Using this principle, the "most final possible" cadences may be systematically derived by investigating all possible cadences of two or three major or minor triads. This procedure may be demonstrated aurally, providing a systematic, perceptual basis for the standard cadence and prolongation types (authentic cadence, "plagal prolongation," etc.) and other common chord progressions.

Implementation

The described software is currently being developed. Significant changes to the above plan may be expected as the project evolves, and as students try out the method and make suggestions. The method is initially being implemented on a Macintosh Classic II, using 8-bit sound resources played over the built-in 22 kHz sound output, a "lo-fi" approach that requires no additional hardware. Future expansion will include a MIDI link to a 16-bit, 44 kHz sampler.

Conclusion

The chief advantage of the ear-training component of the method is the use of computer-generated octave-complex tones in the earlier units, enabling students to holistically and spontaneously recognize chord and interval qualities, before going on to tackle inversions, other aspects of voicing, and different instrumental timbres.

The chief advantage of the harmonic-theory component is that the internal logic of the harmonic-tonal system may be demonstrated both aurally and theoretically in an integrated program. The long-term plan is to create a systematic derivation of harmonic vocabulary from a limited number of principles (principle of parsimony) that is supported by sound examples at every stage.

Of course, the described method is not intended to stand alone. It should be combined with a range of other CAI and classroom methods of teaching ear training and harmony.

 

References

Hutchinson, W., & Knopoff, L. (1978). The acoustic component of Western consonance. Interface, 7, 1-29.

Krumhansl, C. L. (1990). Cognitive Foundations of Musical Pitch. Oxford: Oxford University Press, 1990 (Chapter 4).

Krumhansl, C.L., & Kessler, E.J. (1982). Tracing the dynamic changes in perceived tonal organization in a spatial representation of musical keys. Psychological Review, 89, 334-368.

Miller, G.A. (1956). The magical number seven plus or minus two: Some limitations on our capacity for processing information. Psychological Review, 63, 81-97.

Parncutt, R. (1988). Revision of Terhardt's psychoacoustical model of the root(s) of a musical chord. Music Perception, 6, 65-94.

Parncutt, R. (1989). Harmony. A psychoacoustical approach. Springer-Verlag, Berlin.

Parncutt, R., & Bregman, A.S. (in preparation). Perceptual modeling of probe-tone profiles of short chord progressions.

Plomp, R. & Levelt, W.J.M. (1965). Tonal consonance and critical bandwidth. Journal of the Acoustical Society of America, 38, 548-560.

Shepard, R.N. (1964). Circularity in judgments of relative pitch. Journal of the Acoustical Society of America, 36, 2346-2353.

Terhardt, E. (1982). Die psychoakustischen Grundlagen der musikalischen Akkordgrundtöne und deren algorithmische Bestimmung (Psychoacoustic fundamentals of musical chord-roots and their algorithmic determination). In Dahlhaus, C. & Krause, M. (eds.), Tiefenstruktur der Musik. Technical University of Berlin.