November 25, 2019 / by Giorgia Cantisani / MAAD
EEG-based decoding of auditory attention to a target instrument in polyphonic music
Auditory attention decoding aims at determining which sound source a subject is "focusing on". In this work, we address the problem of EEG-based decoding of auditory attention to a target instrument in realistic polyphonic music.
To this end, we exploit a stimulus reconstruction model which was proven to decode successfully the attention to speech in multi-speaker environments. The decoding procedure is two-fold: first, a feature representation of the attended audio source is reconstructed from the neural response using a multi-channel Wiener filter. Such a filter is known in the field as a backward temporal response function (TRF) and is learned on a training set using a minimum mean squared error criterion. Secondly, the reconstruction is correlated with the ground truth sources to determine the attended source. The attended instrument is recognized as the one that has the highest Pearson's correlation coefficient.
Choosing the audio representation is crucial, as this choice includes a hypothesis about the neural coding of the stimulus and can significantly impact the reconstruction quality and the decoding performance. We studied three different audio representations, one in the time domain and two in the time-frequency domain: the time domain amplitude envelope (AE) computed using the Hilbert transform, the magnitude spectrogram (MAG), and the Mel spectrogram (MEL), a perceptually-scaled representation commonly used for music analysis.
Moreover, we investigated the influence on the performance of multiple variants of musical stimuli, such as the number and type of instruments in the mixture, the spatial rendering, the music genre and the melody/rhythmical pattern that is played. We obtain promising results, comparable to those obtained on speech data in previous works, and confirm that it is possible to correlate the human brain activity with musically relevant features of the attended source.
Accuracy (%) | All | Duets | Trios |
---|---|---|---|
AE | 52 *** | 59 ** | 40* |
MAG | 75 **** | 78 **** | 69**** |
MEL | 75 **** | 76 **** | 74 **** |
For more details, please refer to the paper EEG-based decoding of auditory attention to a target instrument in polyphonic music by Cantisani G. et al., WASPAA, 2019.
This project is part of my PhD Thesis conducted at Télécom Paris under the supervision of Professor Slim Essid and Gaël Richard.