In this paper a general cognitive architecture of spoken language processing is specified. This is followed by an account of how this cognitive architecture is instantiated in the human brain. Both the spatial aspects of the networks for language are discussed, as well as the temporal dynamics and the underlying neurophysiology. A distinction is proposed between networks for coding/decoding linguistic information and additional networks for getting from coded meaning to speaker meaning, i.e. for making the inferences that enable the listener to understand the intentions of the speaker.