In our natural environment, we simultaneously receive information through various sensory modalities. The properties of these stimuli are coupled by physical laws, so that, e. g., auditory and visual stimuli caused by the same event have a specific temporal, spatial and contextual relation when reaching the observer. In speech, for example, visible lip movements and audible utterances occur in close synchrony, which contributes to the improvement of speech intelligibility under adverse acoustic conditions. Research into multi-sensory perception is currently being performed in a number of different experimental and application contexts. This chapter provides an overview of the typical research areas dealing with audio—visual interaction3 and integration, bridging the range from cognitive psychology to applied research for multi-media applications. A major part of this chapter deals with a variety of research questions related to the temporal relation between audio and video. Other issues of interest are basic spatio-temporal interaction, spatio-temporal effects in audio—visual stimuli — including the ventriloquist effect, cross-modal effects in attention, audio—visual interaction in speech perception and interaction effects with respect to the perceived quality of audio—visual scenes.