Machine learning for decoding listeners' attention from electroencephalography evoked by continuous speech.
Previous research has shown that it is possible to predict which speaker is attended in a multispeaker scene by analyzing a listener's electroencephalography (EEG) activity. In this study, existing linear models that learn the mapping from neural activity to an attended speech envelope are replaced by a non-linear neural network (NN). The proposed architecture takes into account the temporal context of the estimated envelope and is evaluated using EEG data obtained from 20 normal-hearing listeners who focused on one speaker in a two-speaker setting. The network is optimized with respect to the frequency range and the temporal segmentation of the EEG input, as well as the cost function used to estimate the model parameters. To identify the salient cues involved in auditory attention, a relevance algorithm is applied that highlights the electrode signals most important for attention decoding. In contrast to linear approaches, the NN profits from a wider EEG frequency range (1-32 Hz) and achieves a performance seven times higher than the linear baseline. Relevant EEG activations following the speech stimulus after 170 ms at physiologically plausible locations were found. This was not observed when the model was trained on the unattended speaker. Our findings therefore indicate that non-linear NNs can provide insight into physiological processes by analyzing EEG activity.