Projects

November 25, 2019 / by Giorgia Cantisani / MAAD

EEG-based decoding of auditory attention to a target instrument in polyphonic music

Auditory attention decoding aims at determining which sound source a subject is "focusing on". In this work, we address the problem of EEG-based decoding of auditory attention to a target instrument in realistic polyphonic music.

To this end, we exploit a stimulus reconstruction model which was proven to decode successfully the attention to speech in multi-speaker environments. The decoding procedure is two-fold: first, a feature representation of the attended audio source is reconstructed from the neural response using a multi-channel Wiener filter. Such a filter is known in the field as a backward temporal response function (TRF) and is learned on a training set using a minimum mean squared error criterion. Secondly, the reconstruction is correlated with the ground truth sources to determine the attended source. The attended instrument is recognized as the one that has the highest Pearson's correlation coefficient.

aad

Choosing the audio representation is crucial, as this choice includes a hypothesis about the neural coding of the stimulus and can significantly impact the reconstruction quality and the decoding performance. We studied three different audio representations, one in the time domain and two in the time-frequency domain: the time domain amplitude envelope (AE) computed using the Hilbert transform, the magnitude spectrogram (MAG), and the Mel spectrogram (MEL), a perceptually-scaled representation commonly used for music analysis.

Moreover, we investigated the influence on the performance of multiple variants of musical stimuli, such as the number and type of instruments in the mixture, the spatial rendering, the music genre and the melody/rhythmical pattern that is played. We obtain promising results, comparable to those obtained on speech data in previous works, and confirm that it is possible to correlate the human brain activity with musically relevant features of the attended source.

aad
Pearson's correlation coefficients of the reconstructed stimulus with the attended source (blue), the unattended one (pink) and the mixture (orange) for the three audio descriptors. Below you can find the corresponding decoding accuracy and their statistical significance.
Accuracy (%) All Duets Trios
AE 52 *** 59 ** 40*
MAG 75 **** 78 **** 69****
MEL 75 **** 76 **** 74 ****

For more details, please refer to the paper EEG-based decoding of auditory attention to a target instrument in polyphonic music by Cantisani G. et al., WASPAA, 2019.

This project is part of my PhD Thesis conducted at Télécom Paris under the supervision of Professor Slim Essid and Gaël Richard.

loghi