Multisensory convergence and sensorimotor integration are important aspects for the mediation of higher vestibular cognitive functions at the cortical level. In contrast to the integration of vestibulo-visual or vestibulo-tactile perception, much less is known about the neural mechanism that mediates the integration of vestibular-otolith (linear acceleration/translation/gravity detection) and auditory processing. Vestibular-otolith and auditory afferents can be simultaneously activated using loud sound pressure stimulation, which is routinely used for testing cervical and ocular vestibular evoked myogenic potentials (VEMPs) in clinical neurotological testing. Due to the simultaneous activation of afferents there is always an auditory confound problem in fMRI studies of the neural topology of these systems. Here, we demonstrate that the auditory confounding problem can be overcome in a novel way that does not require the assumption of simple subtraction and additionally allows detection of non-linear changes in the response due to vestibular-otolith interference. We used a parametric sound pressure stimulation design that took each subject's vestibular stimulation threshold into account and analyzed for changes in BOLD-response below and above vestibular-otolith threshold. This approach helped to investigate the functional neuroanatomy of sound-induced auditory and vestibular integration using functional magnetic resonance imaging (fMRI). Results revealed that auditory and vestibular convergence are contained in overlapping regions of the caudal part of the superior temporal gyrus (STG) and the posterior insula. In addition, there are regions that were responsive only to suprathreshold stimulations, suggesting vestibular (otolith) signal processing in these areas. Based on these parametric analyses, we suggest that the caudal part of the STG and posterior insula could contain areas of vestibular contribution to auditory processing, i.e., higher vestibular cortices that provide multisensory integration that is important for tasks such as spatial localization of sound.