hide
Free keywords:
-
Abstract:
The brains of human and nonhuman primates are thought to contain brain regions that have specialized for processing voice and face content. Although voice- and face-sensitive regions have been primarily studied in their respective sensory modalities, recent human functional magnetic resonance imaging (fMRI) studies have suggested that cross-modal interactions occur in these regions. Here, we investigated whether, and how, neuronal spiking activity in a voice region is modulated by visual (face) stimulation.
Using fMRI-guided electrophysiology, we targeted neurons in a voice-sensitive region in the right supra-temporal plane of two rhesus macaques. We used dynamic faces and voices for stimulation, including congruent and incongruent audiovisual pairs. Different stimuli by monkey and human callers were organized in a multifactorial design, to analyze the impact of the following factors on neuronal audiovisual influences: caller species, familiarity, and identity, and call type.
Within this voice-sensitive region, we obtained recordings from 149 auditory responsive units, 45 of which demonstrated visual influences. The majority of the visual modulation was characterized by audiovisual responses that significantly deviated from the sum of the responses to both unimodal stimuli (i.e., non-additive multisensory influences). Contrasting monkey ‘coo’ calls with human-mimicked ‘coos’ revealed qualitatively similar, but quantitatively different audiovisual processing of conspecific relative to heterospecific voices; human calls elicited more sub-additive interactions than monkey calls. The call type and speaker identity factors interacted and significantly impacted upon both the direction and amplitude of the visual influences. Finally, familiar voices consistently elicited stronger audiovisual influences than unfamiliar voices, despite auditory responses being similar. Lastly, we compared the specificity of audiovisual interactions and the reliability of neuronal responses across congruent and incongruent audiovisual pairs. In some cases, we found neurons to be differentially affected by voice-face congruency, e.g., neurons were most sensitive to violating the congruency of a conspecific voice/face pairing as caused by substituting the monkey face with a human face.
In conclusion, our study links to human fMRI studies on cross-sensory influences in voice/face regions, and the results describe the nature of the visual influences on neuronal responses in a voice-sensitive region in the primate brain. The results also help to characterize the stimulus feature-dependent influences on the cross-modal effects into this region.