Spatial Manipulation and Computers: A Tutorial for Composers

 

 

 

Durand R. Begault

 

 

This tutorial is an introductory survey of current understanding and practice related to the production of spatial effects in computer music. It is addressed to composers and researchers as a means of clarifying the relationship between current models of understanding for "spatial hearing" and loudspeaker music, and the manifestation of these models in the context of computer music programs. We focus discussion on issues relevant to quadraphonic or stereo playback systems, certainly the most commonly used formats for those working with spatial manipulation in computer music. A holistic approach to the subject  covering the research and aims of psychophysicists, acousticians, computer programmers, and loudspeaker composers  has been lacking in the literature; and while computer music is truly an interdisciplinary activity, the advantages and limitations of current practice and knowledge in the field of spatial manipulation have yet to be examined together in any systematic way. This tutorial will elucidate these relationships, pointing out considerations for future research and allowing composers to separate the possible from the improbable.

 

Method of Review: The Source Medium Receptor Model

 

In discussing spatial manipulation for the computer music composer, it is appropriate to consider the entire chain of communication involved for the musical idea. For this tutorial we adopt the broad categories used by psychophysics to describe sound waves (Roederer 1975), with elaborations specific to spatial manipulation. The broader categories are source, medium, and receptor. Within each category we can describe a more detailed chain of events (Figure 1). In this tutorial, discussion of the chain of communication is considered in reverse (receptor-medium-source), in order to provide an informed perspective on the spatial manipulation computer programs reviewed in the last section.

 

The source of a spatially manipulated sound involves the composer's spatial conception for a sound and its manifestation into electrical energy delivered to a loudspeaker by a particular spatial manipulation computer program. The chain of events can be envisioned as follows: 1) the composer's spatial conception, 2) the composer's method for specifying this conception to the computer, 3) the computer's interpretation of this specification into parameters for acoustic modification of the sound, and 4) the process of digital-analog conversion and amplification that supplies the loudspeakers. The medium involves the loudspeakers, the concert environment, and the air through which the resulting sound waves propagate; in this tutorial, a concert hall or outdoor performance space typically encountered in the performance of computer music. The receptor is the listener who experiences these sound waves in some manner; this experience of spatial hearing is investigated as two broad categories: 1) the initial perception of a given sound, as understood by physicists and psychophysicists, and 2) the higher level cognition of spatial manipulation by a listener. The term perception is used in this tutorial to refer to the relatively immediate " awareness of the elements of the environment through physical sensation: reaction to sensory stimulus," whereas cognition refers to " the act or process of knowing in the broadest sense: an intellectual process by which knowledge is gained about perception or ideas" (Webster 1979).1

 

 

 

Figure 1: The Source  Medium  Receptor model

 

Problem Statement

 

The tacit assumption we make is that the composer desires an idealized situation: where the initial conception of the composer in the first part of the stage is identical, or reasonably close, to the receptor's cognition in the final stage.2 In the communication chain

outlined above, each additional conceptual model involves a translation of intent which is imperfect, causing the percept of the listener to differ significantly from the composer's intention. For instance, we can specify to a computer the frequency, duration, and amplitude of a sound with reasonable assurance of the audible result. We can also predict within reason, based on informed use of parameters, the relative harmonicity or inharmonicity of a sound generated by Chowning's FM synthesis program (Chowning 1977). By comparison, spatial effects are especially unpredictable, due to the sensitivity of these effects to the transformations between composer and computer, loudspeaker and room acoustics, and between the medium and the receptor. What is needed by computer music composers and researchers is an understanding of the effects of the translation of intent that occurs at each of these transformations. Such an understanding will show the way to more idealized and informed programs for spatial manipulation, and help to eliminate the mismatch between source and receptor.

 

As we shall see, the origins of the mismatch problem are partially idiomatic to the particular discipline. Psycho-acousticians have experimental goals which isolate parameters of localization in order to examine specific physical mechanisms in humans. Research in concert hall acoustics tends to emphasize the subjective quality for a given musical application, usually symphonic music. Composers are perhaps the greatest bearers of responsibility for the mismatch in their assumption that spatial transformations that can be drawn on a terminal or described verbally can actually be perceived. We supply two examples of notations used by composers that function more as personal records or as programmatic elements, rather than as descriptive of the actual resulting sound (Figure 2).

                                                                                                      

                                   

Figure 2: Notations used by Micheal McNabb (Dreamsong) and Roger Reynolds (Voicespace I) to describe their spatial manipulations.

 

 

 Historical Use of Spatial Manipulation in Music

 

The potential of organized interaction between sound source locations was probably recognized early on by man for hunting purposes. Birds communicate across space by calls that say essentially, "I'm over here!," and there can be no doubt that such signaling proved useful for man to imitate. This establishment of spatial relationships with sound was soon taken into the context of religious ritual. The call and response between a congregation and a leader was probably the first organized (composed) type of spatial manipulation. The well documented use of antiphony (c.f. Michael Praetorious Syntagnum Musicurn, Vol. III) was a Renaissance tradition that exploited the architectural acoustics of the performing space. Giovanni Gabrielli's In Ecdessis and the works designed for St. Mark's Cathedral are examples; the Sonata Piano e Forte is well known for advances in music that were in conjunction with its antiphony. The Baroque concertato principle helped influence a tradition of echo pieces, exemplified by works of Vivaldi, Mozart, and Haydn. Berlioz used four separate brass choirs in the famous Requiem. A listener hearing these works is not only aware of the specific location of each choir, but also of the space in between the choirs, and of the nature of the architectural space.

 

The use of timbre and amplitude cues to effect illusions of distance was understood by orchestral composers of the nineteenth century. Wagner's use of a single muted horn playing pianissimo suggests distant hunters in Tristan und Isolde. Mahler often utilized the instruction "in der Ferne" (in the distance) to various instruments in his orchestral works. Mahler also literally relocated sound sources offstage to create an illusion of distance. Ives placed " heavenly choirs" in different locations of the hall for his Fourth Symphony, and wrote detailed conductor's notes on rendering relative distances for specific instrumental groups. Thus, aspects of movement and distance were utilized as musical considerations before the existence of loudspeaker music.

 

The advent of loudspeakers and electrical control devices offered the possibility for a different approach to spatial manipulation. Whereas Mahler, Ives, and more recently Henry Brant achieved their spatial effects by literally positioning the sound sources in the desired locations, modern recording techniques and loudspeakers allowed the possibility of producing illusions (both controlled and uncontrolled) of sound source movement, distance, and environment, and allowed illusions of sounds beyond the actual location of the speaker itself. By contrast, there are those who compose spatial music by locating multiple speakers at the desired locations. Examples of composers who have worked in this way include Pierre Henri and his loudspeaker concerts, Gordon Mumma's installation at the Pepsi Pavillion at Expo70 in Tokyo, Varese's  Poème Electronique at the Philips Pavillion in Brussels, and Stanley Schaff's Audium installation, currently active in San Francisco. In this approach, the composition of spatial manipulation is dependent on the location of point sources. Stereo recording techniques developed in the popular music industry in the late 1960's, along with the advent of 481624 track recording machines, helped influence the alternative compositional approach of creating illusions at the mixing board. This is an approach where sound space disposition is created long after, and irrespective of, the original recording situation.

 

Digital audio technology has not only expanded the sophistication of amplitude variation, the principle means used at the analog mixing board for spatial effects; it has allowed other parameters to be utilized as cues to the spatial hearing mechanism. Composers have only begun to tap the potential for computer-based spatial manipulation in their loudspeaker music, with the development of facilities such as IRCAM in Paris, CCRMA at Stanford, CARL at UCSD, and at Northwestern University. In section 4, we outline computer programs used at these facilities and their relation to the cues and translation factors outlined in sections 2 and 3.


The Receptor: Spatial Cognition and Perception of Auditory Space

 

In section 2.1.12.1.8, we overview aspects relevant to the integration of spatial cues, the integration of these cues as a type of learning, and the apparent consequence of this learning as we relate it to spatial hearing. This is followed in section 2.2.12.2.8 by an overview of psychophysical experimentation that pertains most directly to the loudspeaker composer. A general discussion of the relevance of psychological models of spatial perception to composition follows in section 2.3.

 

Spatial Cognition of Sound; Integration of Cues

 

Localization and spatial hearing are two related, albeit distinct terms. Spatial hearing is defined as the listener's ability to learn about significant details of the sound source over time; to separate a particular sound source from other sounds into a spatial image; to compare the sound's characteristics over time to previously heard sounds in memory; and to determine the environmental context in which the sound is heard, be it remembered or unfamiliar. Localization refers to the ability inherent in a normal human of sufficient age to estimate the angle and velocity of a sound source relative to oneself. The localization of sound sources, and the reverberant characteristics of the space which these sounds articulate, are perceptions to which we can attach cognitive meaning. At root, the location of a sound source is important to survival (hunting for example), and the location of a sound source can be associated with a familiar environment, principally due to reverberation characteristics.

 

Models of understanding proposed by psychophysics have resulted from attempts to isolate specific hearing mechanisms; an identification of these mechanisms would contribute to a verifiable, comprehensive model of auditory perception. Their findings are significant towards a general understanding of spatial hearing, but limited in significance to the computer musician because of the isolation of specific mechanisms. The context of laboratory experiments that use headphones, head restraints, and anechoic chambers are artificial when compared to the concert hall situation. An understanding of the nature of the mind's integration of these cues would be more valuable to the composer. For example, distance perception is a type of spatial hearing that relies on the integration of several psychophysical cues, as well as memory. At present, the integration of cues is only marginally understood, partially due to the difficulty of controlling multiple cues in a laboratory situation.

 

Without question, there is much left to be understood in terms of integration of localization cues over time, knowledge which can only benefit the composer. "Given the limited amount of coverage that total spatial hearing has received, it should not be surprising that many interesting aspects of the spatial hearing system have neither been discussed nor discovered" (Kendall 1984).

 

Spatial Experience and Learning

 

Spatial experience is one which encompasses much of our lives, and has often been the object of analogy and speculation by philosophers, poets, scientists, and musicians. We speak of possessing a " personal space," as well as visiting " the wide open spaces;" we think of space as organizing everything but time; and use space as a metaphor for evaluating relationships between objects outside of ourselves, as well as between objects and ourselves. Space can be used to understand something inherently non-spatial; for example, multidimensional scaling programs (such as KYST) can be used to relate perceived timbral differences to mathematical distance.

 

Spatial cognition in general is a process of interest to several fields of psychology; cognitive psychologists are interested in how individuals represent their environments with their mind's eye, and developmental psychologists are interested in the change of

spatial abilities over the lifespan.3 To the composer who prepares two-dimensional charts of sound paths (trajectories) and specific point source locations, and/or who is interested in creating synthesized spatial environments, psychology may be able to provide models of understanding that are at least partially relevant. The questions are, "Is there a similarity between spatial cognition in the everyday environment, and that synthesized in auditory space? Can a composer benefit from these models of understanding? If part of the listener's experience is evaluating and possibly mapping the environment in which s/he encounters in the world. then this daily experience is perhaps an area to evaluate in relation to composition. In this sense, the music of spatial manipulation can be thought of as one which appeals to the listener's ability to visually abstract suggested landscapes, enclosures, and trajectories in the mind's " aural eye."

 

Arnheim states in Visual Thinking that " there is no way of getting around the fact that an abstractive grasp of structural features is the basis of perception and the beginning of all cognition" (Arnheim 1969). Do we carry with us visually conceived patterns or maps of spatial experiences similar to the maps used or imagined by the composer? For instance, visually imagine the aural trajectory of an insect flying around your head; and compare this to the visually imagined aural trajectory of a jet flying overhead. Although we may not have a specific memory map of insect or jet trajectories, we do have a rudimentary knowledge of their spatial disposition; enough to distinguish one from another, and enough to compare them to future encounters with these sounds. This experience-based knowledge might include the size of the source, the velocity, and possible types of trajectories.4

 

 Wolhill's Model of Cognitive Spatial Ability

 

Psychologists often refer to levels of cognitive spatial ability, each level involving a more complex operation of cue integration. The following is one of several models that have been proposed (Wolhill 1981).

 

Initially, one's ability to select relevant information from the stimulus field for localization must occur. Attention to the idea that localization should occur is probably important at this level as well.5 " Cue learning" involves the immediate association of localization cues into meaningful percepts, for distance, angular position, and elevation. Memory of spatial relationships between stimuli is the next level. Woihill states that:

..it is clear, particularly in regard to spatial orientation in the large scale environment, that the learning and remembering of the sequential order of stimuli, or of their locations relative to some landmark or prominent axis of organization plays an important part in the formation of spatial reference systems... (Wolhill 1981).

This is also related to our ability to integrate separate stimuli into the perception of movement. Kincha and Allan, in reference to vision, propose that movement perception occurs when the observer is able to compare the memory trace of a visual object from one position in space to another position in the field (Kincha and Allan 1969). This situation is perhaps analogous to the cognition of auditory movement, and perhaps relates to the concept of auditory " streaming" proposed by some psychophysicists.

 

A spatial referent system is established as the highest level of cognition. This level seems most applicable to the composer's strategy. In reference to vision, Wolhill states that:

it is apparent that a kind of reference system utilized by the individual is a function of situational factors, such as the presence and absence of landmarks, the demands of a particular task, and possibly the individual's experiential history...(individuals possess) the ability to utilize multiple frames of reference, and to switch from one to another in accordance with the demands of a given situation" (Wolhill 1981).

 

These multiple frames of reference may be used for spatial hearing as well, influenced and marked by the kinds of aural landmarks that are recognized by spatial memory. 2.1.4

 

Cognitive Mapping of Auditory Space

 

Spatial representation via cognitive mapping is the most highly complex and most difficult to describe ability that humans utilize to represent and compare spatial frames of reference. Butler and Flannery proposed that we possess "spatial referent maps" to associate various frequencies with different azimuthal positions (Butler and Flannery 1980). Related to this are studies that examine the innate sense that sounds with higher frequency fundamentals "appear" higher on a vertical plane of reference relative to lower frequencies. This phenomenon was experimentally verified by Pratt (Pratt 1930) and Roffler and Butler (Roffler and Butler 1968). But equating frequencies with different locations outside of the body is only the surface of the nature and the form of a more complex sense of auditory mapping.. How, for instance, does a listener map the trajectory of a moving sound source in the imagination? How is a distant sound source mapped in the imagination in relation to a nearby sound source? The idea that some sort of patterning exists in the mind is supported by the survival aspect of localization. Roederer states that "Holistic pattern recognition is a fundamental requirement for animal survival in an environment in which many correlated events occur at the same time in different spatial locations" (Roederer 1975). Especially in the absence of visual stimuli, it is likely that visual modes of thinking (in terms of patterning or mapping) occurs in connection with spatial hearing.

 

Cognitive psychologists have pointed out that internal representation of space probably does not conform to the distortion inherent in the Cartesian coordinate map. Downs points out that two-dimensional Cartesian representation is a wholly arbitrary choice on the part of the researcher, although potentially useful.

 

"The cartographic map is a good basis for analogy though it does not work as a metaphor. Analogies demand a well-known situation with clearly specified concepts and laws of relationship that can be translated into terms of the lesserknown situation...much of cartography is an art: map design is notoriously subjective" (Downs 1981).

 

Thus, spatial manipulation composition on a flat surface (such as a CRT) that is interpreted as cartographic (x,y) coordinates must be considered a subjective, personal record for the composer, rather than as a representation of audible effect.

 

Neurophysicists have shown an interest in auditory spatial pattern recognition and discrimination. Klingon and Bontecou state that:

To the extent that any neurologic function is 'acquired,' it may be said that auditory localization is also acquired. Yet it might be more accurate to say that auditory localization is a fundamental component of the total sensory organization of the cerebral hemisphere and that a lesion in the hemisphere may cause disorganization to the extent that auditory localization becomes defective or lost (Klingon and Bontecou 1966).

Ruff and Ferret used a vertical, 10 x 10 loudspeaker arrangement, with a sound source that moved through the speakers, " writing" the numerals 0, 2, 3, 6, 8, and 9. They found that these complex auditory patterns could be identified significantly above chance by normals6 (Ruff and Perret 1982). Ruff and Perret mention that "Auditory space is often composed of sounds originating from a sequence of sources, and the perception does include both the localization of each source and a spatial integration between them. This latter mentioned aspect is not assessed in localization paradigms of single and stationary sources and not in the tasks used to perceive the distance of sound sources." The problem of isolation of multiple integrated cues has been the main barrier to the cognitive paradigms Ruff and Perret have proposed. Currently, Ruff has applied their work as therapeutic training for the brain damaged.

 

Predominance of Visual Modality

 

On the basis of previous experience, our visual understanding of a sound source is influential to spatial hearing. If a sound is familiar to us, we tend to localize the thing emitting the sound according to its most probable location. Given appropriate auditory cues, we would localize jets, thunder and birds skyward; automobiles, humans and sirens at approximately eye level; rattlesnakes on the ground level; and strange noises beneath us, if our automobile or bicycle runs over something.7

 

Fisher found in several experiments that vision was completely dominant over audition, and suggested an order of spatial modalities in terms of dominance to be " visual, tactile-kinaesthetic, auditory" (Fisher 1962). In audiovisual presentations such as television or movies, persons speaking to one another or faces being smashed by a fist are localized to their visual locations. Yet the actual sound source, a speaker, remains fixed in position, at a distance from the visual medium. Thus, familiarity with a sound's most likely position, and/or visual cues, can actually modify auditory cognition to a great extent.

 

The Host Space Environment; Sound Source Size

 

Other consequences of the cognitive process of spatial hearing involves the perception of the spatial qualities of the environment the sound source occupies (the host space) and the perception of the size of a sound source. These aspects are probably cognized largely on the basis of previous experience. Regarding the latter aspect of source size, louder" (perceived loudness as a function of amplitude, spectra, duration, etc.) can in some crude sense denote " bigger" , if we audition barking dogs behind an opaque fence. But for two different types of sounds, previous knowledge influences perception of sound source size. For example, a car at a great enough distance can be potentially less " loud" than an insect at our ears, yet the car is known to be " bigger" . The case for unfamiliar sounds probably follows the "louder" is " bigger" paradigm; and perhaps, if we were unfamiliar with the sound of an automobile, the car in the previous example would be perceived as smaller than the insect (which would be a larger problem for us if it were near our ear).

 

The perception of a host space refers to the particular reverberant characteristics of the space which a localized sound source articulates (Reynolds 1975). We associate the vastness and dimensions of a space and its volumetric and/or architectural implications by evaluating these reverberant characteristics in comparison to previous reverberant experiences.8 Even weather conditions are cued by relative reverberation time  consider the damped quality of sound when propagated outdoors in falling snow. Another aspect informative of the host space is the quiet background sounds that are incidental to most environments. This is actually a textural consideration, where active sounds moving in close proximity to a listener are juxtaposed against a fixed, statistically diffuse background of uncorrelated sounds. Recordings of environmental sounds (the ocean, forest sounds at night) and what composer Brian Eno has termed "ambient music" are examples of sound that is purely background in its textural character. Such sounds are probably more informative of a host space's character than actual reverberation characteristics.9

 

Cognition of Sound Source Distance

 

As was mentioned, the cognition of sound source distance involves an integration of multiple cues and previous experience. Distance of a sound source is usually measured relative to the location of the listener. Amplitude variation can give a crude sense of distance in terms of " louder means closer," but this is a very undependable cue in itself. Previous experience with sound sources at various distances is a major factor in evaluating distance (Coleman 1962, 1963), as a result of integrating amplitude, spectral and rever beration cues.

 

The proportion of direct to reverberant sound source energy was shown by Bekesy to affect judgements of distance (Bekesy 1960). Mershon and King comment on the contribution of reverberant energy as allowing the listener to set " ...appropriate boundaries (or potential boundaries) for a source or a set of sources" (Mershon and King 1975). Sheeline conducted experiments with computer generated and real reverberation to determine its effect on distance perception. He found that cues provided by intensity differences (between members of a set of stimuli in a reverberant environment) are important, but insufficient for the formation of reliable distance cues. " Accurate distance judgements should therefore not be possible upon initial exposure to unfamiliar sounds" (Sheeline 1982).

 

Spectral cues are also involved in the cognition of distance. A sound source's higher frequencies diminish more rapidly than lower ones, with increasing distance of the source. Sheeline states that:

The listener's familiarity with both the source signals and the acoustic environment is clearly a key component in any model for auditory distance perception...the most critical stimulus components for distance are changes in spectral content of the source, relative to some known or familiar standards, in the presence of reverberation cues. (Sheeline 1982)

In other words, we seem to have a memory of familiar sound sources at various distances that is formed on the basis of amplitude and spectral content. In the context of total spatial hearing, a more complex operation occurs by evaluating the reverberant content of the sound to its direct signal energy. The same cues work less effectively with non-familiar


sound sources. See section 2.2.6 of this tutorial, "Spectral Cues and Distance Perception," for additional discussion.

 

Psychophysics of Auditory Space

 

In the following section we summarize information about the localization mechanism as given by psychophysical experimentation in laboratory contexts. A general overview of results applicable to the computer musician is given rather than an itemization of specific experiments and conclusions. The reader is referred to Mills (1972) or Blauert (1983) for concise overviews in this regard.

 

Introductory Perspective on Psychophysics

 

Psychophysics of sound deals with the human response to stimuli. From the physical description of sound, we are able to make predictions that are exact and unique, via a conceptual model. This description is properly based on observation of physical sensation; we can determine irrefutably via a probe whether a nerve cell is firing electrical impulses to the brain. Psychophysics on the other hand must deal with mental processing, and can only indicate probabilities of human response to a stimulus.

 

It is important to note from the onset that most psychoacoustic experiments work with procedures that are not at all similar to conditions found in real rooms, especially at concerts of loudspeaker music. Only recently have experiments begun to be performed in reverberant conditions (e.g., Sheeline (1982) and Hartmann (1983)). Laboratory findings are relevant in that they elucidate specific mechanisms of localization, and indicate abilities of these mechanisms under controlled conditions. However, they only indicate a small, isolated portion of the total process involved in total spatial hearing.

 

The typical psychophysical experiment attempts to eliminate the influence of reverberation and frequency response of a room by using anechoic chambers, headphones, or the rooftops of tall buildings. The types of sounds utilized in these experiments are typically one or more of the following: pure tones (sinusoids), noise (either full spectrum or high-low-band passed), and clicks (impulses). The use of noise more closely approximates the complex sounds of the real world, while clicks simulate the onset transient information of a complex sound. Pure tones and noises are usually turned on and off slowly, in order to eliminate transients. In inter-aural difference experiments that do not utilize headphones, a bite-board or head restraining device is often used.

 

A field related to localization is lateralization, which is defined as the illusion of sounds within the head. Because lateralization can be effected with headphones in the absence of room acoustics, psychoacoustic study in this field is quite popular. Early researchers tended not to distinguish lateralization from localization. A classic experiment that demonstrates a lateralization effect has the resulting illusion of a sound source that rotates within the head. When two tones at a mistuned consonant interval are fed separately to the two ears with headphones, these is a phase difference which the brain interprets as differing arrival times to each ear. This differing time of arrival is known as inter-aural time difference, abbreviated ITD. The illusion, while fascinating, can only occur under controlled conditions.

 

inter-aural Time and Amplitude Difference Cues

 

Localization neurologically involves two stages; first, a comparison of the signal arriving at both ears, and second, monaural processing of the signal into auditory space perception.10 In order to evaluate the localization of a sound arriving at both ears, we

naturally turn our head towards the sound object in order to establish a perpendicular to the wavefront; localization is optimized under such a condition (Mills 1972). A comparison is then made on the basis of integration of several inter-aural and monaural cues. The two principal features of inter-aural comparison are inter-aural time difference (ITD), and inter-aural amplitude difference (IAD). The role of ITD and IAD have been established in the literature to the degree that they are collectively referred to as the " duplex theory of localization. inter-aural time differences are thought to be crosscorrelated in the medial superior olive of the hearing system network; some theories propose a single mechanism sensitive to both time and amplitude, while others propose two different physiological mechanisms that are eventually cross-correlated (Roederer 1975, Jeffress 1971). In the following discussion we illustrate these mechanisms, and then describe their frequency dependent characteristics.

 

Consider a listener in an anechoic chamber, and a sound source oriented directly ahead of the listener on the median plane11 (Figure 3). In such a theoretical position the sound source would arrive at the eardrums at exactly the same time; no ITD would exist. Next, move the sound source along the azimuth of the listener to the right. The angle in relation to the median plane means now that the distance the sound must travel to the listener's left ear is no longer equal to the distance it travels to the right ear, and thus the sound will arrive at the left ear later in time. If the size of the waveform is smaller than the diameter of the head, inter-aural amplitude differences (IAD) result. This is because higher frequencies will be masked at the farther ear due to the shadowing effect of the head. Low frequencies diffract around the head to the far ear with little or no change in amplitude. Thus, we are able to utilize IAD cues as a second means of localizing high frequencies.

 

Psychoacoustic experiments conducted by Stevens and Newman (Stevens and Newman 1936) and later confirmed by Sandel (Sandel 1955) and Mills (Mills 1958) have helped to establish ITD and IAD as complementary functions. In their experiments, attempts were made to find the usable frequency range for IAD and ITD, using pure tones, clicks and noise. Stevens and Newman found that ITD functions up to around 1.5 kHz., and that IAD functions best above 4 kHz.; the region between 2 kHz. and 4 kHz. resulted in the highest error rates for accurate localization. While Stevens and Newman considered IAD usable across the entire frequency range, Mills found IAD usable above 4 kHz. (Mills 1958). Mills also established the cutoff point between the two mechanisms at 1.5 kHz. (Mills 1972).12 See Figure 4 for a rough comparison of localization ability as a function of frequency and azimuthal position.

 

The degree of front-back ambiguity was most pronounced in the frequency region between the two functions, i.e. around 1.5 kHz (a source at 0 deg. azimuth technically has the same values for ITD and IAD as a source at 180 deg.). This pronounced ambiguity at 1.5 kHz. is explained as the result of the dividing point between the IAD and ITD mechanisms. Stevens and Newman reported that frequencies below 3 kHz. were reported at 0 deg. or 180 deg. on the order of chance, leading them to the conclusion that the cutoff point between ITD and IAD was at 3 kHz.. We emphasize that localization was influenced in these experiments by the frequency, angular location on the azimuth, complexity of the sound, transient or non-transient character, and duration.

                                                                                                                                                                                                                                                                                                                     
                                                                                  

Figure 3: Inter-aural Time Delay

 

Transient Envelope ITD Cue

 

Lateralization experiments with 1.4 kHz. high-pass filtered noise showed that information usable to the ITD mechanism could be extracted from frequencies supposedly out of the usable frequency range for ITD. The conclusion is that the auditory system extracts the overall amplitude envelope of higher frequency components at both ears, and measures the difference in time of arrival of the two envelopes (Klump and Eady 1955). This cue is termed the transient envelope response. The sensitivity of hearing to transient information is important to spatial perception; a sound trajectory defined by a source that has rapidly articulated transients is easier to localize than a tone with an initial attack followed by an exponential decay in amplitude. If the particular sound source includes frequencies below and above 1.5 kHz., then it can be said to be available to three types of localization mechanisms: ITD, IAD, and transient envelope ITD response.

 

 

The Minimum Audible Angle

 

Further relevant information is gained about the perception of angular position by experiments that measure the "just noticeable difference" of two different sound sources on the listener's azimuth. This was termed the minimum audible angle (abbreviated MAA) by Mills (Mills 1958). He found that the MAA became increasingly larger as a 750 Hz. pure tone was moved along the azimuth, relative to the same sound source on the median plane: from a MAA of 1 deg. at 0 deg. azimuth, to a MAA of about 7 deg. at 75 deg. azimuth. The MAA was also found to be largest at 1.5 kHz., the duplex mechanism's proposed breakpoint discussed in section 2.2.2.

 

The role of head movements have been studied in their role in minimizing the MAA, as well as in their virtual elimination of 0 deg.180 deg. azimuth ambiguity. Wallach (1940), Runge and Thurlow (1967) and Thurlow, Runge and Mangels (1967) established that aids to the accuracy of localization included the changing pattern of ITD and IAD cues with head movement, and kinaesthetic cues from the neck muscles.13 Head movement is the human equivalent for the movable pinna found on most animals. " One has only to watch a dog attempting to localize the source of a faint sound to realize how important a role head movement and movement of the pinna play in localization" (Jeffress 1971).

 

Monaural Cues; Spectral Cues from Pinna and Shoulders

 

In addition to the duplex mechanism, transient envelope sensitivity, and head movement, several other factors play a role in the psychophysical literature as a source of cues. These include both monaural and binaural pinna cues, monaural head shadow cues, and cues obtained from sound waves striking the shoulder. In considering monaural pinna and head shadow cues, there is the notion that a single ear can perform localization tasks without the aid of the other ear. The reflections of the pinna and the head influence the overall frequency response as a function of the incoming angle of the sound source. These frequency response functions are analyzed and compared to previous knowledge of the sound's frequency content at a particular known location. The influence of the pinna is most significant in this regard.

 

The use of spectral cues in sound localization was first mentioned by Pierce (1901), although their influence was not fully recognized until the late 1960's. The pinna, because of its complex shape, imposes comblike filtering on an incoming sound spectrum. This filtering results from the sound's incidence on the interior reflecting surfaces, and the reflections between these surfaces (Rodgers 1981). The time interval of the first significant reflection within the pinna influences the frequency domain in terms of the spectral content of the sound. These time intervals are dependent both on vertical and horizontal angles of the incoming wavefront. For sounds originating 180 deg. on the azimuth and below the level of the ears, the pinna also acts as a low pass filter (Rodgers 1981). Thus, the pinna is significant for eliminating the previously mentioned 0 deg.180 deg. sound source ambiguity, as well as for determining vertical location.

 

inter-aural pinna disparity, the comparison of spectral information arriving at the two ears, has been suggested as a cue (Musicant and Butler 1983). Shoulder bounce reflection and its spectral influence, as well as the possibility of binaural shoulder bounce cues, have been investigated by Gardner (1972). The relative effectiveness of these cues is shown below in section 2.3.

 

Spectral Cues and Distance Perception

 

Spectral cues for distance include the influence of atmospheric conditions, molecular absorption of the air, and the curvature of the wavefront. All of these factors modify the apparent spectral content of the sound source itself, adding to the parameters utilized in the cognitive process of distance perception mentioned in section 2.1.7.

 

The water vapor content of air is commonly experienced as humidity, and interacts with the molecular absorption of air itself to attenuate higher frequencies with increasing distance (Moorer 1979). Several computer models, including the Cmusic unit generator "air-absorb," have been designed to simulate this effect.14

 

Sound waves usually have a planar front as they reach a listener's ears; but a closely proximate sound source will have a curved wavefront, resulting in added emphasis to the lower frequencies. "Within the range of roughly a meter, it is apparent that an emphasis of low frequency sound energy, relative to high frequency energy, should indicate the approach of a sound" (Sheeline 1982 after Bekesy 1960 and Coleman 1962). As distance increases, the sound wave propagation is more planar, and with increasing dis tance, high frequencies diminish as a result of molecular absorption. Thus, diminished high frequency content can effect the illusion of distance (Coleman 1968). Sheeline reports a 3.3 dB decrease in high frequency spectral energy at 8000 kHz. for a complex sound at 100 feet from the listener, as compared to a closely proximate source (Sheeline 1982).

 

The Precedence Effect

 

The precedence effect (also termed the "Haas Effect") is especially relevant to the computer musician, since it deals with the concept of a single sound distributed to two or more sources. Stereo recording techniques (including monaural recording that have been enhanced" for stereo) distribute a single sound source to two speakers with slightly different relative amplitudes, equalizations, phase, and/or time delay values between them. The difference between the speaker signals can perceptually fuse to create a single image, within a certain tolerance of the values given to the parameters for each speaker. The result is that the stereo presentation is heard as qualitatively different from the monaural presentation. To understand the reason for this requires an understanding of the precedence effect (Haas 1949; Wallach, et al 1949).

 

In normal concert situations, the sound from a loudspeaker or performer reaches our ears via a number of different paths. This includes the direct path of the wave-front from the source to the listener (assuming no obstacles are present), and many indirect paths that are time-delayed reflections from walls and other surfaces in the environment. Within a tolerance of about 40 ms., we somehow ignore the locations defined by reflected sounds and instead associate the actual location of the source at the origin of the first wave-front  even when the total energy of the additional delays exceeds that of the first wave-front.

 

Imagine a listener seated at an equal distance from two loudspeakers that carry discrete channels of an identical transient sound, such as a click. Given identical levels of amplitude from the two the speakers, and with no delay between the two signals, the listener will localize the sound source image at center position between the two loudspeakers. This type of location is known as the phantom image, since no actual sound source corresponds to its illusory location. Now introduce a time delayed version of the sound (delay = 5  30 ms.) into the right channel. Rather than hearing two discrete sounds, the listener will instead hear a single image localized as left-of-center. The phantom image will be displaced increasingly to the left with increasing delay time given to the right channel, until a certain threshold  at which time two distinct sounds (an echo) is heard, and the precedence effect no longer operates.15 Amplitude can contribute to the precedence effect as well. Increase the amplitude of the right speaker (the time-delayed channel) in the previous example. As the amplitude is increased, the phantom image will be displaced (pulled back) towards the center. This trade off of IAD and ITD cues is illustrated in Figure 5. The precedence effect works best for centering images when speakers are 60 deg. apart on the azimuth. The 90 deg. angle of most quad and stereo playback systems requires increasing amplitude when approaching a centered image from left or right. Chowning's Simulation of Moving Sound Sources, discussed in section 4.2.1, accomplishes this by making the energy ratio between speakers proportional to the tangent of the angle.

 

Illusory effects related to the precedence effect have been discussed by Gardner (Gardner 1972). Imagine again the example of the two speaker  one sound source  centralized listener, with acoustic energy being equal from the two speakers. By introducing a small amount of delay (delay = c. 5 ms.) into the left speaker, we might still localize the sound source at the center, but with the perception of the sound as being " louder" or "broader" than in the case with no time delay. The increased perceptual loudness is the psychoacoustic effect of temporal fusion, and the broadness is the resultant " larger image" of the size of the sound source. At a certain threshold of time delay, the delayed signal in the left channel would cause the phantom image to 'move right of center.

 

Dissimilar but complementary frequency responses for the two channels can contribute to the image of a single source with full-band output, but with the source " broadened" in comparison to full-band output from a single source (Gardner 1972). This is a form of quasi-stereophony used for producing stereo recordings from monaural sources in the commercial recording industry. Schroeder describes a similar goal that is accomplished with all-pass filters (Schroeder 1961).

 

Our ability to localizationally ignore delayed waveforms within tolerances is important to understanding reverberation. Reverberation is a perceptual phenomenon resulting from a collection of delayed signals that usually decay exponentially over time. If it were not for the precedence effect, a single violinist playing in a reverberant concert hall would be perceived as having a number of locations equal to the number of delays, plus the direct signal. The precedence effect deals with perceptual fusion as to location, but on another cognitive level, the delays are monitored in order to cognize the character of the reverberation.16 The computer modeling of these delays is discussed in section 4.1.2 of this tutorial.


Relevance of Spatial Cognition and Perception to Composers

 

We have outlined several stages that are involved in the perception of spatial sound within the source. To predict the exact spatial effect of varying amplitude or time delay parameters is impossible if we are considering anyone besides a person in the center of a hall with a head restraining device. Hence, one one hand, we could say that psychophysics is useless to a computer music composer if he wants to predict exact and unique point source localization for his concert listeners. ITD and IAD parameters are very sensitive to head movements made by listeners in the real world. As organisms, we are extremely sensitive to change and we know that a listener is very sensitized to changes in spatial parameters for survival. Although the exact total mechanism for spatial hearing remains vague, it is evident that empirical audition allows effects on the spatial hearing mechanism to be composed as a musical parameter.

 

Searle measured the standard deviation for a large number of experiments and subjects, in localizing a sound source on the azimuth across a 90 deg. arc. He found the following standard deviations: ITD, IAD, head movement, mono head shadow (each considered separately) = 5.9 deg.; inter-aural pinna cues = 15.0 deg.; monaural pinna cues = 24.0 deg.; and monaural shoulder bounce cues = 68.0 deg.. These figures can give a composer a relative feel for the effectiveness of simulating these cues in a spatial manipulation program.

 

The Medium: Loudspeakers and Room Acoustics

 

The following section considers loudspeakers and room acoustics, and the mismatch that occurs between the source environment and the playback situation. The concepts of imaging and ideal listener location are broached.

 

Concept of Loudspeaker Imaging

 

An issue relating to quad and stereo playback is the concept of imaging a sound source in a phantom location, as was discussed in section 2.2.7 in connection with the precedence effect. Stereo playback has a finite line, and quad playback a single point, termed the "ideal listener location" (Figure 4). Quadraphonic playback systems were criticized when they first appeared on the audio market in the early 1970's because the ideal listener location was restricted to this single point. (Critics of quadraphonic computer music are still heard to bemoan this issue.) In theory, the recording engineer or composer who mixes the sound in a control room has a stereo or quad image equivalent to what a person seated in the ideal listener location will hear. Several factors make the possibility of this situation ever occurring almost nil. Concert presentations involve different rooms and speakers than the rooms and speakers with which the computer sounds are usually composed. Just as the soundbox of a violin is a natural extension of the sound of the bowed string, the acoustics of a particular space are the extensions of an instrument played within it. This has always been true for music in terms of the tempi chosen by conductors and performers, and the influence of the room acoustics on perceived timbre, attack time, and dynamic range. A similar sort of musical variance is true for spatial manipulation.

 

Other factors besides room differences contribute to the fact that precise recreation of imaging is almost impossible. The distance from the speaker to the ideal listener location is much longer in a concert situation, as is the distance separating speakers in order to encompass a wide audience area. The audience will have absorptive influence on the sound, depending on what kinds of clothes they're wearing; they also have a form of background noise peculiar to audiences. The studio to concert transformation involves the added influence of reflective obstacles (people, chairs) between the speakers and the ideal listener location.

 

 

Figure 4: Ideal Listener Locations for Quad and Stereo

 

Room Acoustics. and Imaging

 

The reverberation characteristics of a concert hall are quite complex, and are only partially represented in computer simulations. Recording studios and similar playback situations are designed to be as free from excess reverberation as is possible, while concert halls that usually host loudspeaker music are designed to add c. 2 seconds of reverberation time to live instrumentalists. This is actually a basic concert hall design aesthetic that allows a hall to be most pleasing in terms of random distribution of reflections.17 Unfortunately, such design is not ideal for speaker illusions produced in studios that have good imaging and little reverberation.

 

Another differing factor is the effect of the actual dimensions of a room on frequency response, In particular, the low frequency response of different rooms is highly variable. Standing waves (where two sound waves of identical frequency and pressure variation interact to emphasize the amplitude of that wave at a specific frequency) and vibrational nodes (where pressure waves interact 180 deg. out of phase and cancel each other out) cause particular locations in some halls to have widely varying frequency responses. In a simple enclosure, such as the theoretical model of a two-dimensional rectangle, it is an easy matter to calculate these emphasized or deemphasized modes of vibration (see Beranek 1954). There is also a limited number of these aberrant frequencies in a simple enclosure. Unfortunately, in a three-dimensional room with highly irregular geometries, the ability to predict these effects on all audible frequencies for each listening position is almost impossible. Different concert halls will also feature different frequency absorption characteristics as a function of humidity and wall surfaces.

 

Speaker= Time Alignment

 

The most influential and most nonlinear aspect of any playback situation are the speakers. Different speaker designs have different image distances  the distance needed to be from multiple radiators in speakers to hear a fused sound. Phase response, distortion characteristics, and frequency response "coloration" all differ between different speaker designs. Yet, rarely are similar speakers used in control rooms and playback situations.

 

The issue of time alignment refers to the capacity of a speaker with multiple radiators to deliver all frequencies of a broad band sound equally in time to a particular location. In terms of physics, lower frequencies require more energy, and thus more time, to move the diaphragm of a woofer than would high frequencies delivered from a driver typically utilized in high quality speakers. Factors such as recessing the high frequency driver have been demonstrated in some speaker designs as a solution, and there are a couple of expensive speakers available that solve the problem with sophisticated crossover networks. But for the most part, even most expensive speakers are not truly time aligned, and the differences in time of arrival for all frequencies will vary between manufacturers. This is especially critical for computer simulations that model arrival time as a perceptual cue, such as with the work done with pinna transforms at Northwestern University. Even unequal speaker cable lengths have different impedance, which can cause differences in arrival time.

 

Speaker= How Many?

 

A final issue surrounding the medium is the appropriate number of loudspeakers. What is an ideal number? Consider a soundpath which is intended by a composer to circle at an equal distance from a blindfolded person in the ideal listener seat. One way to produce the illusion would be to mount the loudspeaker on a circular track surrounding the listener, and have the composer push the speaker around the track. In this situation, the illusion from a physical standpoint is truly a sound that circles around the listener, whether the listener cognitively perceives a circle, an ellipse, or whatever. (If the listener is told that they are supposed to hear a circle, this will aid in the cognitive visualization.) Now replace the speaker on the track arrangement with enough speakers placed next to one another so that there is a circular wall of speakers surrounding the listener. Each speaker in this case represents a "quantization" of the actual path of the loudspeaker on a track. Distributing amplitude to each successive speaker would likely be equivalent perceptually to the speaker on the track. Now progressively remove loudspeakers from the circle, and audition sounds in the same manner. At some point the illusion would be qualitatively different and break down the circular illusion (perhaps jaggedness in the linearity of the circle would be the cue). To maintain the illusion of a circle with fewer speakers would require an increasing number of psychoacoustic cues to offset the " quantization error" of the circular presentation.

 

A circular illusion produced with a single loudspeaker is probably impossible but, considering we have two ears to perceive all dimensions of movement and space, it should be theoretically possible with two speakers. This theory allows modeling based on the locations of the speakers, rather than of the ears, and is utilized in computer models such as Moore's General Model (c.f. section 4.2.3). A circular illusion with two loudspeakers has reportedly been accomplished under controlled conditions at Northwestern University by Kendall et al (Kendall 1984), using pinna spectral transformations. Also, binaural recordings made with pinna equipped "dummy" heads produce very realistic front/back, azimuth, and elevation cues, when listened to with headphones or in an anechoic chamber with speakers and crosstalk filters (see Schroeder 1972). The use of spectral cues will likely prove to be very significant in models that simulate spatial illusions with a limited number of speakers.

 

Essentially, it is important to note that the possible error in perceiving illusory locations or paths decreases with increased numbers of speakers; and that a fewer number of speakers demands increasing numbers of controlled psychoacoustic parameters to be

specified for such illusions. The practical situation of two or four loudspeakers must be offset by greater control and understanding of significant localization parameters. Additionally, the control and nature of these parameters must be strong enough to offset the effects of the medium: the influence of loudspeaker and listening room variation.

 

Conclusions

 

The spectral and phase convolutions introduced by the listening space, combined with differences in loudspeaker design and increased distance from the source, cause the receptor at a concert situation to receive a significantly altered version of what was heard in the control room by the source  hence, the medium has been shown to be significant as a cause for the mismatch discussed above.

 

The degree to which this translation actually affects imaging is not known in any straightforward way, and once again this is a situation highly dependent on the type of sound, as well as on the means by which the phantom image is produced (amplitude, delay, spectra, or combinations). The concept of change in parameters influencing spatial images is probably only heard when the degree of this change exceeds the influence of the medium.

 

There is something questionable about the concept of imaging in itself. The only realistic situation whereby imaging in the control room could be accurately reproduced is one where acoustical conditions were exactly equal to the control room. Section 3 of this tutorial has shown why this cannot be so. Hence, the concept of proper imaging is unapplicable to the concert situation, and perhaps even to the real world  except when measuring loudspeakers in an anechoic chamber.18

 

Regarding the normal presentation of quad or stereo loudspeaker music, a composer might do best to consider an idealized listener's area, rather than an ideal listener's seat. The limits to the size of a listener's area would be constrained to the locations where some sort of spatial transformation was audible. Theoretically, any seat within such an area would provide an acceptable " translation" of spatial conceptions made by the source.

 

The Source: Computer Models for Spatial Manipulation

 

The following section outlines several computer models for digital reverberation (section 4.1) and spatial manipulation (section 4.2). Most have been published in The Computer Music Journal (MIT Press; Cambridge, Mass.). We outline the differences between them, and their position in the source-medium-receptor context. Digital reverberation schemes are focused on first, since they play an integral role as subsets of spatial manipulation programs.

 

Reverberation Defined

 

Reverberation is a term for the collection of time delayed reflected sounds that originate from an initial, direct wavefront (the direct sound). The direct sound is defined as the wavefront which reaches the ears first. The direct sound is said to have reverberation if the direct and reflected sounds are perceived as a single sound source. If the reflected sound is delayed in time to the point where a distinct, second iteration of the sound is heard, then this reflected sound is commonly termed an echo. Roederer's definition states that reverberation begins when a sound source is "turned off;" the energy from the delays that influence the direct sound while the source is " turned on" are technically not directly perceivable, due to the precedence effect19 (Roederer 1975). However, there is definitely an audible effect made by these early delays, which collectively alter the amplitude and quality of the direct sound. In this tutorial, reverberation refers to all energy of a sound source, exclusive of the direct sound.

 

Reverberation is an important element in spatial manipulation in that it can alter the quality of the perceived host space for a sound or groups of sounds.20 We have discussed the significance for distance perception of the ratio between direct and reverberant energy for a sound. Change in the reverberation characteristics of sound played over a stereo or quad playback systems is less dependent on listener position, and therefore more likely to be perceived by an audience, than other forms of spatial manipulation. Reverberation can also alter perceived position, attack quality, and timbral quality of the sound.

 

Early and Late Echo Response

 

The pattern of early delays informs a listener about the qualities of the room, based on the room's modification of the initial attack and in terms of signal correlation at the two ears (see above). The initial attack of a sound can be perceptually smoothed out by early delay patterns at significant amplitudes (an example of this with natural reverberation is the lack of " bite" a pizzicato attack has on a stringed instrument in a highly reverberant room). The pattern of early delays also informs the listener about the dimensional aspects of the environment immediately surrounding the sound source. These early delays, which occur at irregular time intervals and which are quite dense overall, are collectively termed the early echo response. The use of the word "echo" in this term is misleading, because of the previous definition given for echo as a distinctly heard iteration of the sound. The delayed reflections in the early echo response are not perceived as distinct sounds, but instead perceptually fuse with the direct sound to form a single percept, as described by the precedence effect.

 

The early echo response has a duration on the order of 580 milliseconds, depending on the reflective properties of the architecture. Following this portion of the sound is a more spatially diffuse reverberation which contains overall less energy in the delays, and attenuated higher frequencies. These later delays are made up of many subsequent reflections off of walls, within cavities, balconies, etc.. We term these later delays as the late echo response. The late echo response is informative perceptually as to the volume of a particular space, as a result of the time it takes for the later echoes to achieve their maximum amplitude after the sound source has been " turned off" . Recording engineers and acousticians refer to the late echo response as the time it takes for a room to "speak" , referring to the duration of active delays within the particular space.

 

In defining overall reverberation time, we note when the late echo response decays to 60 dB. of the original signal. The amplitude decreases as a function of absorption of delays by air molecules, humidity, wall and seating materials; and as a function of the angle of incidence on a particular surface.

 

To summarize: we first hear the direct sound, which informs us as to the identity and location of a sound. This is followed by the early echo response, which is primarily the first reflections off nearby surfaces towards which the sound source radiates. These delays fuse with the direct sound, altering its perceived quality and simultaneously informing us of  time intervals between delays. Finally, after about 80 milliseconds, the room begins to truly "speak" from all directions. This is the late echo response, where the bulk of delayed sound energy accumulates. During this period, attenuation of overall amplitude and of higher frequencies occur. The attenuation of the high frequencies likens reverberation to a lowpass filter that becomes increasingly narrow over time. Finally, when the amplitude of all delays has decayed to less than 60 dB., reverberation has by definition finished.21

 

Digital Reverberator Design

 

Moorer has written an extensive overview of reverberator design in his article About This Reverberation Business (Moorer 1979). He traces Schroeder's initial development of comb and all-pass filter reverberators, both of which involve attenuated feedback loops. He also outlines Schroeder's series and parallel combination of these filters. Schroeder and Logan outlined the basic aims of the artificial reverberator (Schroeder and Logan 1961). These can be paraphrased as follows: 1) The frequency response should be flat, 2) The normal modes of the reverberator should cover the entire audio frequency range, 3) the reverberation times of individual modes must be equal or nearly equal so that different frequency components will decay with equal rates, 4) echo density in the early echo response must be great so that individual echos are not perceived: 5) no flutter echors should be apparent (flutter echos are periodic echos resulting from sound waves bouncing back and forth between parallel surfaces): 6) periodic or comb-like frequency responses should be inaudible. The last condition is the most difficult to achieve, since all feedback loops are inherently periodic in their frequency response, and since feedback is an essential part of any economically realizable filter.

 

Digital reverberation has been a topic of interest for both concert hall acousticians and composers of electronic music. The father of concert hall reverberation simulation research has been Schroeder, whose articles Colorless Artificial Reverberation and Improved Quasi Stereophony and Colorless Artificial Reverberation (both written in 1961) have supplied the foundations of most digital reverberators in use today (Schroeder 1961). For a recent review of computer models for concert hall acoustics, the reader is referred to Schroeder's Progress in Architectural Acoustics and Artificial Reverberation: Concert Hall Acoustics and Number Theory (Schroeder 1984). The essential goal of these studies is to be able to predict architectural acoustic character before actually constructing concert halls, and to determine the reverberation characteristics most favored by listeners in listening to symphonic music. Work towards this goal has influenced researchers such as Moorer in considering necessary parameters for obtaining the most natural sounding reverberation.

 

In a comb filter reverberator, the simplest multiple echo design, the delays decay exponentially, and the frequency response is comb-like, causing a coloration to the input sound. In the all-pass filter reverberator, a flat frequency response is attained by mixing a portion of the undelayed signal with the delayed sound. However, the flatness of the ailpass filter frequency response is only an overall condition, given a steady state sound. Moorer points out that the phase response of the ailpass filter is quite complex, and that short term transient response is inadequate. "In fact, both the all-pass and comb filter have very definite and distinct 'sounds' that to the experienced ear can be immediately recognized" (Moorer 1979).

 

The strategy in designing reverberators is to combine feed-forward or feedback loops of the direct signal with combinations of all-pass and comb filters into a particular configuration, with appropriate gains and delays. A concern is to minimize the individual characteristics of the reverberators, in favor of an overall " natural" sound that works with any variety of sound inputs. Moorer's research found that the most economically useful reverberator was a comb filter as described by Schroeder (Schroeder 1961), with the modification of a one pole low-pass filter to replace the gain stage (Moorer 1979). With this design, the low-pass filter becomes increasingly pronounced with increasingly longer delays. This relates the effect of air absorption to delay length, since increasing delay time corresponds to increased distance a reflected sound must travel through air.

 

Another approach to reverberation simulation is via convolution. By multiplying the spectra of a given sound with the spectra of the impulse response of a room, realistic delayed diffusion of the sound over time can be achieved. The technique is to use a recorded impulse response of a room, simulated by a transient followed by exponentially decaying noise. When this recording is convolved with an input signal, the resulting delays are perceptually similar to reverberation. A program for convolving two sound files for this result is in use at the CARL facility at UCSD.22

 

A sound source in an environment will have a radiation pattern which scatters energy wave-fronts towards various absorptive and reflective surfaces, both vertically and horizontally. Most computer models only deal with the simulation of the horizontal dimension of this radiation. Concert hall research has found that listener preference is affected by the vertical dimension, in that horizontal reflections should reach the ears first. Listeners preferred the initial reflections to be de-correlated, that is, with information arriving at the left ear differing from that arriving at the right ear (Schroeder 1975, 1984). A larger vertical dimension relative to the horizontal one causes the incidence of de-correlated reflections to occur before the correlated vertical reflections. Hence, a consideration for effective reverberation modeling is the notion of spatial reverberation. This is discussed in connection with pinna cue simulation by Kendall et al (Kendall 1984). Both Moore's SPACE program and Stautner and Puckette's ROOM program allow specification of the sound source radiation parameter.

 

Moorer-Stanford/CCRMA Reverberator

 

The reverberator used in 1982 at Stanford University's CCRMA was described by Sheeline as utilizing a configuration of tap delays, comb-lowpass filters, and all-pass filters as modular parts (Sheeline 1982, see Figure 5). A percentage of the direct signal is assigned to the reverberator, which is transformed as follows. First, a number of tap delays are used to simulate the first reflections, each tap delay being a non-recirculating delayed signal. This means that the early echo response of the reverberation simulation is directly related to the number of taps used, and the delay time values assigned to each tap. As was mentioned, the early echo response will effect the perceived dimensions of the room, as a result of the delay time.

 

The signals from the taps are passed directly to the output, as well as being summed and sent to the next stage of the reverberator. This stage consists of the comb low-pass filters designed by Moore and described in section 4.1.3. Usually there are a number of these in parallel. Finally, these outputs are summed and passed to all-pass filters in both series and in parallel. These filters accomplish the de-correlation of the signal to multiple outputs by sending the result of the series output to parallel filters with slightly different settings; with each output channel assigned a discrete all-pass filter. The all-pass filters also accomplish an increased density for the late echo response.

 

 

Figure 5: CCRMA's Modular Reverberator

 

Stautner and Puckette's ROOM program

 

Stautner and Puckette described a program under development at MIT called ROOM, in an article titled Designing Multi-Channel Reverberators (Stautner and Puckette 1982). ROOM is primarily concerned with simulating the early echo response for multiple channel output, but also permits the modeling of absorptive characteristics of a space. The early echo response is designed to be sensitive to sound source position. Successive delays are sent to discrete output channels to achieve de-correlation. A specification for room size controls the delay and gain parameters of the filters, and path lengths for reflections within the room are calculated by the ray imaging method (see Moorer 1979 or Kutruff 1972), which further attenuate the gain parameter as a function of distance.

 

The heart of the ROOM design is a comb filter matrix that distributes successive delays to multiple outputs. The user specifies to the program the dimensions of a rectangular room to be modeled, the dimensions and coordinates of objects within the room, a feedback matrix for the delay outputs, a list of delay values, and a list of sound source locations within the room. From this, the early response is modeled via calculated values given to the filter parameters.

 

....we simply add various proportions of the source signal directly into the delay loops for each channel. The length of each delay determines the amount of the early response that can be simulated for that channel and direction" (Stautner and Puckette 1982).

Each speaker in the ROOM model has the potential to have output assigned from the comb filter matrix.

 

There is also variable frequency attenuation for each output channel (accomplished with parallel lowpass, bandpass, and high-pass filters) that model the absorptive characteristics of objects within the room. The result of these parallel filters reportedly adds to the richness of the sound, and conceals the characteristic comb filter sound mentioned in section 4.1.3. The nonlinear aspect of the comb filter frequency response is also obviated by the use of weighted random functions that are applied as deviations to the values given to the delay time parameter. Stautner and Puckette stated that:

 

"When delays of various lengths are used in the general recursive network (of the reverberator)...the frequency response achieves these (comb filter) peaks and troughs at very irregular intervals in frequency, which removes the buzzy sound comb filters give" (Stautner and Puckette 1982).

 

The use of random delay time values was not a useful technique for Moorer in creating a "natural" early echo response.

 

"Various ways were tried to synthesize a suitable early response, from using the results of geometric simulation of the room, real or fictitious, to choosing random numbers....one cannot just compute a bunch of impulses at random, and expect that they will sound good" (Moorer 1979). (See Stautner and Puckette 1982, page 58: "A further improvement can be gained by continuously varying the delay lengths in a random way. This has the effect of shifting the resonant peaks in the frequency response and decreases the possibility of flutter" ).

 

It is important to note that the ROOM program is applicable to single source locations, but not really adapted to moving sound sources. Stautner and Puckette proposed a future modification of the program; an interpolation function that would read the distance between two points, supplying values to the delay parameters. They acknowledged the variability of quality from comb filter reverberation  "There is still no easy way to gauge overall coloration in a complex network, and new possibilities are born with each unitary matrix..." (Stautner and Puckette 1982). Their model does not consider variable listener position or other effects of the medium.

 

Chowning's QUAD program

 

The first program for computer based spatial manipulation was described in Chowning's Simulation of Moving Sound Sources (Chowning 1971). The program provides sound modification for amplitude and reverberation parameters to four channel playback, based on a path drawn on a CRT that represented the path of the moving sound source. This path was represented on a grid as x,y} coordinate pairs and used as values distance and angular placement parameters.23

 

Chowning itemized cues for distance, angular placement, and velocity. A source moving at a particular velocity determines amplitude functions to each speaker, as well as percentage of reverberation. He utilized two types of reverberant sound, in relation to distance: global reverberation (equal level of reverberation distributed to all four speakers, decreasing at a level of 1/distance), and local reverberation (a distribution of reverberant energy to speaker(s) that define the direct signal, at a level equal to (1(1/distance)).24

 

Local reverberation was intended to simulate the delay pattern of a source as it would articulate a distant space, as opposed to the reverberant quality of the location of the listener. The percentage of local to global reverberation was scaled according to the distance of the source; local reverberation increasing and global reverberation decreasing with the movement of the sound source away from the listener. The proportion of direct to reverberant signal is scaled according to distance as well (in a ratio of one over the square root of the distance).

 

Amplitude of the non-reverberated signal between the individual channels is scaled and distributed by making the energy ratio between the two speakers nearest the source proportional to the tangent of the angle. This compensates for the 90 deg. angle between the speakers; as was mentioned in section 2.2.7, the precedence effect for imaging a location between two speakers works best with a 60 deg. angle between them. This compensation can be accomplished by other functions as well. The idea is to increase amplitude as the source moves between speakers.

 

A Doppler shift function is utilized to scale frequency, according to the velocity and position of the sound in relation to the listener. This is provided as a velocity cue for moving sound sources, in addition to " velocity information from the rate of angular shift in energy (angular velocity)....(and)...the rate of radial shift in energy..." (Chowning 1971). These last two velocity cues are inherent in the amplitude calculations mentioned above.

 

The use of cartographic representation was shown to have possible perceptual mismatch earlier on in this tutorial. The program assumes an ideal listener location in its calculations for amplitude, but Chowning mentioned that "any cues to localization of a source which are dependent on delay, phase, and orientation are inappropriate." The compositional utility of the program has resulted in its continued use thirteen years after its original description. The compositional use of the program is quite audible in Chowning's own composition Turenas (1972); high frequency sounds with reiterative attacks are the most successful sounds in the work in terms of dramatic spatial movement.

 

Federkow, Buxton and Smith Program

 

Federkow, Buxton and Smith described a spatial manipulation program in 1978 in an article titled A Computer-Controlled Distribution System for the Peformance of Electroacoustic Music (Federkow, et al, 1978). This program was in use at the time at the SSSP facility at the University of Toronto. Following a technique similar to that used by Salvatore Martirano in his SalMar Construction, Federkow, et al, proposed a system which divided up to four input signals with 200 Hz. crossover networks. Frequencies below 200 Hz. are sent to a single high quality bass speaker, and the frequencies above 200 Hz. are distributed to multiple speakers in response to live performer control. The performer moves the sounds to the high frequency speakers with the use of a " mouse" device; the movement is traced by a microcomputer onto a terminal.

 

With this strategy, a greater number of inexpensive speakers can be used for frequencies above 200 Hz. (they proposed sixteen channels, and were using four at the time of publication). The use of sixteen channels would greatly reduce the dependence on the precedence effect (c.f. section 3.5, "How Many Speakers?). They expected that the localization of frequencies below 200 hz. to be inconsequential compared to the higher frequencies, given a complex tone.

 

A four-input-one-output mixer and amplifier is connected to each of the high frequency speakers, each input to the mixer corresponding to one of the sound sources.

 

...the motion of a channel of sound is sketched by hand (with the mouse device)...since the microprocessor is capable of updating parameters in real time, the sound of that channel then follows (or tracks) the motion of the tablet cursor as the trajectory is being drawn. (Federkow, et al 1978).

 

No control is provided for reverberation or other distance cues in the program, making the Cartesian representation of the performer exactly equivalent to an amplitude "joy stick" controller.

 

Moore's SPACE program

 

Moore described in an article titled A General Model for Spatial Processing of Sounds a spatial manipulation program called SPACE, currently in use at the CARL facility at 1.3CSD (Moore 1983). It is currently the most complex published approach to spatial manipulation. The model represents an attempt to mediate the problem of calculating each of the early echo response delays (effective but costly) and the alternative of using comb or all-pass filters to approximate reverberation (cheap but unrealistic). The model is significant in that it features a spatially sensitive early echo response.

 

The program models an inner room for the audience location, and an outer room for the location of the perceived sound sources (see Figure 6). The definition of an outer room for defining the illusory space bears resemblance to the scheme utilized in Stautner and Puckette's ROOM program. The speakers are modeled as " windows" from the outer room into the inner room, through which spatially manipulated sounds in the outer room are heard by the listeners. The user can specify the location of a sound or sound-path, relative reverberation time, arbitrary number and position of speakers, and the dimensions of the outer and inner rooms.

 

The program calculates amplitude to the speakers by the length of a direct path to each "window." To produce the early echo response via a tap-delay system (similar to the one described in section 4.1.4), the program also calculates delay paths to each " window" (speaker) specified. The total number of direct paths from a sound source is equal to (1 * no. of speakers), while the number of delays in the early echo response simulation is (no. of speakers * no. of walls). The number of walls is always four in the implementation described. Thus, a quad speaker playback system with a single sound source has potentially four direct paths, one to each speaker, and sixteen reflected paths bounced off each inner wall of the outer room to the speakers. The reflection paths are determined by the ray imaging method (Kutruff 1972, Moorer 1979). The length of these paths determines the values given to the tap delay filters; there are four FIR filters for each speaker output. The paths are potential, in that if a path's shortest distance crosses the inner room, it is deleted from the model (i.e., the inner room's outer walls are modeled as completely absorptive). Moore mentions the calculation of this " cut" factor to be the most computationally intensive part of the program.

 

Reverberation is similar to the CCRMA model described in section 4.1.4, except that the tap-delay filters are assigned to specific channels, making the distribution of early delays correspond to sound source positions in the outer room. Each of these FIR filters are dynamic, in that they are able to interpolate successive values for moving sound sources. This dynamic feature also results with a frequency shift equivalent tDoppler shift. Amplitudes of the delays are scaled according to the length of the path, the influence of the molecular absorption of the air, and absorption from the collision with the surfaces of the outer room's inner wall. Potentially, the model could be modified to represent additional reflections on the outer room's walls with additional tap-delay units and an algorithm for finding a second signficant reflection, etc.. As it is currently implemented, the additional reflections and the late echo response is simulated by the statistical approximation of the comb-low-pass filter and all-pass filter combinations, similar to those described in section 4.1.4.

                                                 

                                 Inner room specifications: 10 X 10 (meters).

                                         Outer room specifications: 30 X 30 (meters).

                                 Stereo speaker positions: (5,5) (5,5) (windows from outer room to inner room)

 

                                 Dimensions of the outer room influence path lengths used to calculate early echo lengths used to

                                 calculate early echo response in the tap delay filters (reflections off inner wall, as determined by imaging method).

 

                                 Spatial manipulations (moving sound source trajectories or specific sound source locations) (reverberation)

                                 defined within the outer room; heard by audience through 'windows".

 

Figure 6: Inner and Outer Rooms in Moore's SPACE program

 

Comparison of Spatial Manipulation Programs

 

The trend toward developing spatially sensitive early echo response for sound source location is demonstrated by comparing Chowning's QUAD program to Stautner and Puckette's ROOM program and Moore's SPACE program. The form of spatiallysensitive reverberation in the Chowning model is the proportion between it and the direct sound, and the trade off between global and local reverberation. A sound moving into the distance would have increasingly focused, local reverberation, and a decreasing amount of global reverberation sent to all four speakers. Early echo response simulation is not a calculated feature of this program. The ROOM program uses de-correlated first delays to more accurately model the early echo response of real rooms, with the added feature of variable filtering to simulate reflections off objects specified in the room. The SPACE program outlined by Moore models the early echo response to FIR filters as a set of delays supplied to each speaker, according to sound source position in the outer room. Since the delay distribution is sensitive to the sound source position in terms of azimuth angle to the listener, the pattern of delays supply ITD cues for angular position, as well as room size cues. A comparison of the features of these programs is shown in Figure 7.

 

                       

 

Figure 7: A comparison of Features of Federkow et al., Chowning's  QUAD, Stautner and Puckette's ROOM and Moore's SPACE Programs

 

Work is currently being done by Kendall, Martens and Decker at Northwestern University in utilizing spectral cues based on pinna transformations for spatial reverberation. They plan to apply their work to computer music, and have had success in utilizing pinna cues for front-back, elevation, and angular cues (Kendall, et al 1984). The time alignment of multiple radiator speakers is crucial for their illusions (see section 3.3 of this tutorial).

 

An interesting comparison can be made between Chowning's QUAD program and Moore's SPACE program. It has been emphasized that the Chowning model calculates values for spatial parameters from the position of an idealized listener, while Moore's program calculates values for its parameters based on the location of the speakers. Consider a moving sound source, speaker and listener as diagramed in Figure 8. When the sound source is in position A on the left side and considered relative to the same source displaced to position B, two different illusions will result for the centralized listener, when comparing i,he same trajectory synthesized by both programs. The QUAD program considers position B to be closer to the listener than position A; an amplitude increase would result between position A and B. In contrast, the SPACE program calculates position B as a longer path to the speaker relative to A; hence, an amplitude decrease would occur between A and B. This is important to realize when utilizing cartographic sound paths drawn with programs such as Loy's SOUNDPATH.25

 

 

                                           

 

Figure 8: Comparison of path lengths between Chowning's Simulation program and Moore's General Model for a given tjectory.


Conclusions

The real problem is that accurate modeling of the physical world that influences perception would require incredible expense in terms of computer time, loudspeakers, and program specificity. The strategy is then to model transformations on parameters significant for spatial hearing that are perceptually equivalent to their real world physical transormations. The programs described here accomplish this task with a fair measure of success, given the boundaries and approximations of significant parameters. Attempts could be made to further minimize the mismatch between source and receptor by both 1) including additional parameters significant to spatial hearing, and 2) developing modes of spatial composition that more realistically account for the nature of spatial hearing in the source.

 

Summary

 

With the computer musician in mind, we have summarized types of knowledge which interact to form the basis of spatial manipulation programs that are intended to be heard by an audience in a typical concert situation. These are described in the context of a source-medium-receptor model. Between each stage of this model, a translation of intent occurs in relation to the composer's initial idea. By the end of the translation process, there is usually a pronounced mismatch between the composer's idea (the source) and the listener's experience (the receptor). By outlining the complexity of these translations upon a composer's spatial intentions, we have hopefully demonstrated the importance of the mismatch.

 

Features of each model of understanding at each stage of the translation process were overviewed. With increased awareness of this process, composers and researchers can perhaps develop programs for spatial manipulation that are more informed as to the

nature of spatial hearing.

 

 

 

References

 

Arnheim, R., Visual Thinking, University of California Press, Berkeley, 1969.

Barron, M., "The Subjective Effect of First Reflections in Concert Halls The Need for Lateral Reflections," Journal of Sound and Vibration, vol. 15:4, 1976.

Barron, M., "The Effects of Early Reflections on Subjective Acoustic Quality in Concert Halls," Ph.D. Thesis, University of Southhampton, 1974.

Begault, D., MusicArchitecture, University of California, Santa Cruz, 1979. B.A. Honors Thesis (unpublished)

Begault, D., Trajectoral Sets for Spatial Manipulation, University of California, San Diego, 1981. Graduate Seminar Paper (unpublished)

Bekesy, G., Experiments in Hearing, McGrawHill, New York, 1960.

BeranekAcoustics, McGrawHill, New York, 1954.

Blauert, J., Spatial Hearing, MIT Press, Cambridge, 1983.

Bloom, J., "Creating Source Elevation Illusions by Spectral Manipulation," JAES, vol. 25:9, pp. 7067, 1977.

Borenius, J., "Moving Sound Image in the Theatres," JAES, vol. 25:4, pp. 2003, 1977.

Borish, .1., "Extension of the Image Model to Arbitrary Polyhedra," JASA, vol. 75:6, pp. 18271836, 1984.

Brant, H., "Space as an Essential Aspect of Musical Composition," in Contemporary Composers on Contemporary Music, ed. Schwarz and Childs, Holt, Rinehart and Winston, New York, 1966.

Butler, R. and Flannery, R., "The Spatial Attributes of Stimulus Frequency... Perception and Psychophysics, vol. 28:5, pp. 449457, 1980.

Carr, H., An Introduction to Space Perception, Longmans, Green and Co., New York, 1935.

Chowning, J., "The Simulation of Moving Sound Sources," JAES, vol. Preprint no. 726 (M3) for the 38th convention, 1970.

Chowning, J., "The Synthesis of Complex Audio Spectra By Means of Frequency Modulation," JAES, vol. 21:7, pp. 526534, 1973.

Chowning, J., Grey, J., Moorer, J., and Rush, L., "Computer Simulation of Music Instrument Tones in Reverberant Environments," CCRMA Technical Report, vol. STANM1, Stanford University, 1974.

Howard, I. and Templeton, W., Human Spatial Orientation, Wiley and Sons, London, 1966.

Jeffress, L., "Localization of Sound," in Handbook of Sensory Physiology, ed. Keidel and Neff, vol. 5:2, pp. 449459, Springer Verlag, Berlin, 1975.

Jeffress, L. and Taylor, R., "Lateralization .vs. Localization," JASA, vol. 33, pp. 482483, 1961.

Kendall, G., Martens, W., and Decker, S., Spatial Reverberation: Completing Our Simulation of Spatial Hearing Cues, 1984. Proposal Submitted to the Systems Development Foundation

Klingon, G. and Bontecou, D., "Localization in Auditory Space, Neurology, vol. 16, pp. 879886, 1966.

Klumpp, R. and Eady, H., "Some Measurements of inter-aural Time Difference Thresholds," JASA, vol. 28:5, pp. 85960, 1956.

Knudsen, V. and Harris, W., Acoustical Designing in Architecture, Wiley and Sons, New York, 1950.

Kock, W., Binaural Localization and Masking," JASA, vol. 22:6, pp. 8014, 1950.

Kock, W., "Binaural Fusion of Low and High Frequency Sounds," JASA, vol. 30:3, pp. 222223, 1958. 27,        

Kuttruff, H., Room Acoustics, Applied Science Publishers, London, 1972.

Lambert, R., "Dynamic Theory of Sound Source Localization," JASA, vol. 56:1, pp. 16571, 1974.

Leaky, D., "Some Measurements of the Effects of Interchannel Intensity and Time Difference in TwoChannel Sound System," JASA, vol. 31:7, pp. 97786, 1959.

Leitner, B., Sound Space, NYU Press, New York, 1978.

Liben, L., Patterson, A., and Newcombe, N., Spatial Representation And Behavior Across the Lifespan, Academic Press, New York, 1981.

Lippman, E., "Spatial Perception and Physical Location as Factors in Music," Acta Musicologica, vol. 35, pp. 2434, 1963.

Mastroianni, G., "Influence of Eye Movements and Illumination on Auditory Localization," Perception and Psychophysics, vol. 31:6, pp. 58184, 1982.

Mills, W., "Auditory Localization," in Foundations of Modern Auditory Theory, vol. 2, Academic Press, New York, 1972. ed. Tobias

Mills, W., "On the Minimum Audible Angle," JASA, vol. 30:4, pp. 23746, 1958.

Moles„ Information Theory and Aesthetic Perception, University of Illinois Press, Urbana, 1966.

Moore, B., Introduction to the Psychology of Hearing, 2nd ed., Academic Press, New York, 1982.

Moore, C., "Studio Applications of Time Delay," Lexicon Application Note, vol. AN3, Lexicon, Inc., Walther, Mass., 1976.

Moore, F. and Loy, G., 1984. Personal Communication

Moore, F., "A General Model for the Spatial Processing of Sounds," CMJ, 1983.

,             Moorer, J., "About This Reverberation Business," CMJ, vol. 3:2, pp. 1328, 1979.

Moorer, J., "Signal Processing Aspects of Computer Music A Survey," CMJ, vol. 1:1, pp. 437, 1977.

Morrill, D., "Loudspeakers and Performers: Some Problems and Proposals," CMJ, vol. 5.4, 1982.

Moushegian, G. and Jeffress, L, "Role of inter-aural Time and Intensities in the Lateralization of LowFrequency Tones," JASA, vol. 31:11, pp. 14411445, 1959.

Murch, G., Visual and Auditory Perception, BobbsMerrill, New York, 1973.

Musicant, A. and Butler, R., "The Influence of Pinna-Based Spectral Cues on Sound Localization," JASA, vol. 75:4, pp. 119599, 1984.

Perrot, D., Concurrent Minimum Audible Angle: A Reexamination of the Concept of Auditory Spatial Acuity," JASA, vol. 75:4, pp. 120106, 1984.

Plenge, G., On the Difference Between Localization and Lateralization," JASA, vol. 56:3, pp. 94451, 1974.

Reynolds, R., "Explorations in Sound Space Manipulation," Reports from The Center, vol. 1:1, Center for Music Experiment at UCSD, La Jolla, 1977.

Reynolds, R., "Thoughts on Sound Movement and Meaning, Perspectives of New Music, vol. 16:2, 1979.

Rodgers, P., "Pinna Transformation and Sound Reproduction," JAES, vol. 29:4, pp. 22634, 1981.

Roederer, J., Introduction to the Physics and Psychophysics of Music, 2nd ed.,    

Roffier, S. and Butler, R., "Factors That Influence the Localization of Sound on the Vertical

Ruff, R. and Perret, E., "Spatial Mapping of Two-Dimensional Sound Patterns Presented Sequentially," Perceptual and Motor Skills, vol. 55, pp. 15563, 1982.

Sandel, T., Teas, D., Feddersen, W., and Jeffress, L., "Localization of Sound from Single and Paired Sources," JASA, vol. 27:5, pp. 84255, 1955.

Sawyers, B. and Cherry, E., "Mechanism of Binaural Fusion in the Hearing of Speech," JASA, vol. 29:9, 1957.

Schroeder, M. and Logan, B., "Colorless Artificial Reverberation," JAES, vol. 9:3, pp. 19297, 1961.

Schroeder, M., "Improved QuasiStereophony and Colorless Reverberation," JASA, vol. 33:8, pp. 106164, 1961.

Schroeder, M., "Towards Better Acoustics for Concert Halls," Physics Today, vol. 33:Oct, pp. 2430, 1980.

Schroeder, M., "Natural Sounding Artificial Reverberation," JAES, vol. 10:Jul, pp. 219223, 1962.

Schroeder, M., "Acoustics in Human Communication: Room Acoutics Music and Speech," JASA, vol. 68:1, pp. 2228, 1980.

Schroeder, M., "Progress in Architectural Acoustics and Artificial Reverberation: Concert Hall Acoustics and Number Theory," JABS, vol. 32:4, pp. 194203, 1984.

Schubert, E., "Psychological Acoustics," in Benchmark Papers in Acoustics, vol. 13, Bouden, Hutchinson and Ross, Stroudsburg, Pa., 1979.

Searle, C., "Model for Auditory Localization," JASA, vol. 56:6, pp. 11641175, 1976.

Shaw, E., "Transformation of Sound Pressure Level from the Free Field to the Eardrum in the Horizontal Plane," JASA, vol. 56:6, pp. 18481861, 1974.

Sheeline, C., An Investigation of the Effects of Direct and Reverberant Signal Interaction on Auditory Distance Perception, Stanford University/CCRMA, 1982. Ph.D. Dissertation (unpublished)

Snow, W., "Effect of Arrival Time on Stereophonic Localization," JASA, vol. 26:6, pp. 107174, 1954.

Stautner, J. and Puckette, M., "Designing MultiChannel Reverberators," CMJ, vol. 5:4, pp. 5265, 1982.

Stevens, S. and Newman, E., "The Localization of Actual Sources of Sound," American Journal of. Psychology, vol. 48, pp. 297306, 1936.

Stewartt, G., "The Function of Intensity and Phase in Binaural Localization of Pure Tones;' Physiology Review, vol. 15, 1920.

Stockhausen, K., "Music in Space," Die Reihe, vol. 5, Presser, Bryn Mawr, Pa., 1961.

Thiele, G. and Plenge, G., "Localization of Lateral Phantom Sources," JAES, vol. 25:4, pp. 196200, 1977.

Wallach, H., "The Role of Head Movements and of Vestibular and Visual Cues in Sound Localization," Journal of Experimental Psychology, vol. 27, 1940.

Wallach, H., Newman, E., and Rosenzweig, M., "The Precedence Effect in Sound Localization," American Journal of Psychology, vol. 62, pp. 31536, 1949.

Wilcott, R., "Variables Affecting the Angular Displacement Thresholdof Simulated Auditory Movement," Journal of Experimental Psychology, vol. 49:1, pp. 6872, 1955.

Wolhill, J., "Experimental, Developmental, Differential: Which Way the Royal Road to Knowledge About Spatial Cognition?," in Spatial Representation Across the Lifespan, ed. Liben, Patterson, Newcombe, Academic Press, New York, 1981.

 

 

 1 A usable analogy to this particular usage of perception and cognition is the linguist's model of joke interpretation: individual vowels and consonants are perceived many milliseconds before the actual cognition of the joke occurs (followed by a resultant laugh or moan, depending on the quality of the joke).

 2 We do not propose to channel the idea of music into communication theory, as Abraham oles attempted in his Information Theory and Aesthetic Perception (Moles 1966). Composers often view the addition of "noise" or indeterminancy to their communication system as not only inevitable, but as desirable. However, the differences between intent and actual result in the case of spatial manipulation are especially severe.

 3 Refer to the definitions of cognition and perception for the receptor in section 1.2.

 4 This association of spatial memory is also influenced culturally, similar to the association of diminished chords with scary scenes in old movies.

 5 Musical experience does not usually involve changing spatial configurations; its relevance to our attention can be greatly influenced by the composer informing us that a spatial aspect is significant.

 6 Whether the identification was based on the perceptual integration of the patterns as numerals  or

merely by process of elimination on the basis of leftright azimuth information  remains to be shown. (The sounds used were pure tones, 736 Hz., played 875 ms. per loudspeaker). The most successful identification of the auditory numerals occurred when the cartographic representation was distorted by vertical compression  i.e., when the leftright element was most pronounced.

 7 Many people have reported to the author the experience, when driving highways, of identifying and localizing unfamiliar sounds as possible mechanical failures related to their own automobile, only to find that sound source was from another automobile in the same vicinity. This is a modernday application of localization as a survival mechanism. Another modern day example is the localization of a ringing telephone as our own, only to find that it's our neighbor's phone.

 8 A concise discussion of reverberation is found in section 4.1.2 of this tutorial.

 9 Movie sound designers have probably applied the knowledge of host space sounds most completely.

10 The monaural processing includes the cognitive aspect of spatial hearing discussed in section 2.1.

11 The median plane vertically bisects the body exactly between the ears; its extension outwards from between the eyes to some point directly in front of the body is the 0 degree azimuth point.

12 Stevens and Newman proposed 3 kHz as the cutoff point. Sandel, from data supposedly taken from Mills (1972), gives a frequency range of 20 Hz.12 kHz. for ITD and 1 kHz.12 kHz. for IAD. A misprint may have obscured the more likely values of 20 Hz.1.2 kHz. for ITD.

13 Head movement is a normal condition for concert listening, but a composer might consider the reduction of head movement that occurs when a single object or person, such as a beam of light or a live performer, becomes the object of visual fixation by the audience. Far less head movement would occur in this situation than in a darkened room.

14 "Cmusic" and "airabsorb" are programs written by F.R. Moore.

15 The specific time delay value needed for the breakdown of the effect or for placement of a sound to a discrete location is extremely dependent on the nature of the sound source.

16 This ability bears relation to a similar auditory ability to selectively ignore sounds while focusing attention on other sounds in a multiple sound source environment. This is known as the cocktail party effect. At any gathering of humans, several conversations may be occurring at once, but we are able to focus our attention on a conversation with a single partner. But if our name is mentioned from across the room, we will more than likely hear it quite clearly. The phenomenon is that we are actually able to monitor simultaneous streams of information, and that we can choose which stream of information to focus conscious attention.

17 Schroeder et al found rectangular halls to be optimal for concert music because of the decorrelation of

the delays at both ears. See his "Towards Better Acoustics for Concert Halls," in Physics Today, 33, Oct. 2430, 1980.

18 This is the loudspeaker manufacturer's version of imaging, determined to give a reasonable estimate of how far back the speakers need to be from a listener. the size and shape of the immediately resonated space as a function of

19 See section 2.2.7 for more information on the precedence effect.

20 Refer to section 2.1.6 for definition of the host space.

21 The value of "60 dB." is taken from Roederer (1975), and appears arbitrary. Reverberation is significant to the level that delays are still audible, which is determined by the environment in which the sounds are heard.

 

22  This program was written by Mark Dolson.

23  At time of publication (1971), Chowning described a double jointed arm for tracing the path, which represented a light trace on a CRT.

24 The perceptual benefit of local reverberation was later felt by Chowning not to justify the additional computation (Loy 1984).

25 SOUNDPATH is a program in use at CARL at UCSD for interpolating paths between terminal specified points. The program allows for interactive modification of the paths to be displayed on the terminal.