Michael Kubovy and Michael Schutz (2010)

Audio-visual objects

Review of Philosophy and Psychology, 1(1):41-61.

In this paper we offer a theory of cross-modal objects. To begin, we discuss two kinds of linkages between vision and audition. The first is a duality. The the visual system detects and identifiessurfaces; the auditory system detects and identifies sources. Surfaces are illuminated by sources of light; sound is reflected off surfaces. However, the visual system discounts sources and the auditory system discounts surfaces. These and similar considerations lead to the Theory of Indispensable Attributes that states the conditions for the formation of gestalts in the two modalities. The second linkage involves the formation of audiovisual objects, integrated cross-modal experiences. We describe research that reveals the role of cross-modal causality in the formation of such objects. These experiments use the canonical example of a causal link between vision and audition: a visible impact that causes a percussive sound.


[A fire is] a terrestrial event with flames and fuel. It is a source of four kinds of stimulation, since it gives off sound, odor, heat and light .... One can hear it, smell it, feel it, and see it, or get any combination of these detections, and thereby perceive a fire .... For this event, the four kinds of stimulus information and the four perceptual systems are equivalent.

If the perception of fire were a compound of separate sensations of sound, smell, warmth and color, they would have had to be associated in past experience in order to explain how any one of them could evoke memories of all the others. ...

[T]he problem of perception is not how sensations get associated; it is how the sound, the odor, the warmth, or the light that specifies fire gets discriminated from all the other sounds, odors, warmths, and lights that do not specify fire. Gibson (1966) (pp. 54–55)


In this paper, we offer a theory of cross-modal objects. We agree with Gibson’s assertion that such a theory is unlikely to be an associative theory. Instead, our theory is built on the notion of privileged inter-modal binding. As an example of such privileged binding, we will examine the relation between visible impacts and percussive sounds, which allows for a particularly powerful form of binding that produces audio-visual objects. To motivate these conclusions we devote the first two sections of this article to a review of Kubovy and Van Valkenburg’s (Cognition 80(1–2):97–126, 2001) theory of auditory and visual objects. In the final section, we present our new approach and present empirical data to support our view.

