Auditory Attention Detection Dataset KULeuven

Mendeley Data2024-06-29 更新2024-06-27 收录

下载链接：

https://zenodo.org/record/3997352

下载链接

链接失效反馈

官方服务：

资源简介：

*************************************** Please cite the original paper where this data set was presented: N. Das, W. Biesmans, A. Bertrand, T. Francart, "The effect of head-related filtering and ear-specific decoding bias on auditory attention detection", Journal of Neural Engineering, vol. 13, 056014,2016. DOI 10.1088/1741-2560/13/5/056014 *************************************** IMPORTANT UPDATE FROM THE AUTHORS (October 2023): We have observed the widespread utilization of this dataset in numerous research papers, establishing it as a standard benchmark for evaluating novel decoding strategies in auditory attention decoding (AAD). We wish to underscore the vital importance of conducting rigorous cross-validation in such investigations. In the original study by Das et al., which produced this dataset, linear correlation-based methods were employed, for which a straightforward random cross-validation sufficed. However, with the recent surge in the application of machine learning techniques, particularly deep neural networks, a more stringent cross-validation approach becomes imperative. Deep networks are susceptible to overfitting to trial-specific patterns in EEG data, even from very brief segments (less than 1 second), leading to the ability to identify the trial source. Since a subject typically maintains attention to the same speaker throughout a trial, having knowledge of the trial effectively results in a perfect attention decoding. We observed that many research papers utilizing our dataset still adhere to the basic random cross-validation method, neglecting the separation of trials into training and testing sets. Consequently, these studies frequently report remarkably high AAD accuracies when using extremely short EEG segments (one or a few seconds). Nevertheless, research has demonstrated that such an approach yields inaccurate and excessively optimistic outcomes. Accuracies often plummet significantly, sometimes even falling below chance levels, when employing a proper cross-validation where this trial bias is removed (e.g., leave-one-trial-out, leave-one-story-out, or leave-one-subject-out cross-validation). This overfitting effect is described in: Corentin Puffay et al., "Relating EEG to continuous speech using deep neural networks: a review", Journal of Neural Engineering 20, 041003, 2023 DOI:10.1088/1741-2552/ace73f Moreover, it's important to note that AAD strategies which directly classify an EEG snippet, rather than explicitly computing a correlation between the decoder output and the corresponding speech envelopes, may be susceptible to an eye-gaze bias. This bias refers to the tendency of the subject to subtly and often unknowingly direct their gaze towards the attended speaker. Given that EEG equipment can inadvertently capture these gaze patterns, it becomes possible to leverage this gaze information, whether intentionally or unintentionally, to enhance AAD performance. It's crucial to highlight that within this dataset, there are no controls in place to account for or mitigate this eye-gaze bias. This eye-gaze overfitting effects is discussed in: Rotaru et al. "EEG-based decoding of the spatial focus of auditory attention in a multi-talker audiovisual experiment using Common Spatial Patterns", bioRxiv 2023.07.13.548824; doi: https://doi.org/10.1101/2023.07.13.548824 *************************************** This work was done at ExpORL, Dept. Neurosciences, KULeuven and Dept. Electrical Engineering (ESAT), KULeuven. This dataset contains EEG data collected from 16 normal-hearing subjects. EEG recordings were made in a soundproof, electromagnetically shielded room at ExpORL, KULeuven. The BioSemi ActiveTwo system was used to record 64-channel EEG signals at 8196 Hz sample rate. The audio signals, low pass filtered at 4 kHz, were administered to each subject at 60 dBA through a pair of insert phones (Etymotic ER3A). The experiments were conducted using the APEX 3 program developed at ExpORL [1]. Four Dutch short stories [2], narrated by different male speakers, were used as stimuli. All silences longer than 500 ms in the audio files were truncated to 500 ms. Each story was divided into two parts of approximately 6 minutes each. During a presentation, the subjects were presented with the six-minutes part of two (out of four) stories played simultaneously. There were two stimulus conditions, i.e., `HRTF' or `dry' (dichotic). An experiment here is defined as a sequence of 4 presentations, 2 for each stimulus condition and ear of stimulation, with questions asked to the subject after each presentation. All subjects sat through three experiments within a single recording session. An example for the design of an experiment is shown in Table 1 in [3]. The first two experiments included four presentations each. During a presentation, the subjects were instructed to listen to the story in one ear, while ignoring the story in the other ear. After each presentation, the subjects were presented with a set of multiple-choice questions about the story they were listening to in order to help them stay motivated to focus on the task. In the next presentation, the subjects were presented with the next part of the two stories. This time they were instructed to attend to their other ear. In this manner, one experiment involved four presentations in which the subjects listened to a total of two stories, switching attended ear between presentations. The second experiment had the same design but with two other stories. Note that the Table was different for each subject or recording session, i.e., each of the elements in the table were permuted between different recording sessions to ensure that the different conditions (stimulus condition and the attended ear) were equally distributed over the four presentations. Finally, the third experiment included a set of presentations where the first two minutes of the story parts from the first experiment, i.e. a total of four shorter presentations, were repeated three times, to build a set of recordings of repetitions. Thus, a total of approximately 72 minutes of EEG was recorded per subject. We refer to EEG recorded from each presentation as a trial. For each subject, we recorded 20 trials - 4 from the first experiment, 4 from the second experiment, and 12 from the third experiment (first 2 minutes of the 4 presentations from experiment 1 X 3 repetitions). The EEG data is stored in subject specific mat files of the format 'Sx', 'x' referring to the subject number. The audio data is stored as wav files in the folder 'stimuli'. Please note that the stories were not of equal lengths, and the subjects were allowed to finish listening to a story, even in cases where the competing story was over. Therefore, for each trial, we suggest referring to the length of the EEG recordings to truncate the ends of the corresponding audio data. This will ensure that the processed data (EEG and audio) contains only competing talker scenarios. Each trial was high-pass filtered (0.5 Hz cut off) and downsampled from the recorded sampling rate of 8192 Hz to 128 Hz. Each trial (trial*.mat) contains the following information: RawData.Channels: channel numbers (1 to 64). RawData.EegData: EEG data (samples X channels). FileHeader.SampleRate: sampling frequency of the saved data. TrialID: a number between 1 to 20, showing the trial number. attended_ear: the direction of attention of the subject. 'L' for left, 'R' for right. stimuli: cell array with stimuli{1} and stimuli{2} indicating the name of audio files presented in the left ear and the right ear of the subject respectively. condition: stimulus presentation condition. 'HRTF' - stimuli were filtered with HRTF functions to simulate audio from 90 degrees to the left and 90 degrees to the right of the speaker, 'dry' - a dichotic presentation in which there was one story track each presented separately via the left and the right earphones. experiment: the number of the experiment (1, 2, or 3). part: part of the story track being presented (can be 1 to 4 for experiments 1 and 2, and 1 to 12 for experiment 3). attended_track: the attended story track. '1' for track 1 and '2' for track 2. Each track maintains continuity of the story. In Experiment 1, attention is always to track 1, and in Experiment 2, attention is always to track 2. repetition: binary variable indicating where the trial is a repetition (of presented stimuli) or not. subject: subject id of the format 'Sx', 'x' being the subject number. The 'stimuli' folder contains .wav-files of the format: part{part number}_track{track number}_{condition}.wav. Although the folder contains stimuli with HRTF filtering as well, for the analysis, we have assumed knowledge of the original clean stimuli (i.e. stimuli presented under the 'dry' condition), and hence envelopes were extracted only from part{part number}_track{tracknumber}_dry.wav files. The MATLAB file 'preprocess_data.m' gives an example of how the synchronization and preprocessing of the EEG and audio data can be done as described in [14]. Dependency: AMToolbox. This dataset has been used in [3, 5-16]. [1] Francart, T., Van Wieringen, A., & Wouters, J. (2008). APEX 3: a multi-purpose test platform for auditory psychophysical experiments. Journal of Neuroscience Methods, 172(2), 283-293. [2] Radioboeken voor kinderen, http://radioboeken.eu/kinderradioboeken.php?lang=NL, 2007 (Accessed: 30 March 2015) [3] Das, N., Biesmans, W., Bertrand, A., & Francart, T. (2016). The effect of head-related filtering and ear-specific decoding bias on auditory attention detection. Journal of Neural Engineering, 13(5), 056014. [4] Somers, B., Francart, T., & Bertrand, A. (2018). A generic EEG artifact removal algorithm based on the multi-channel Wiener filter. Journal of Neural Engineering, 15(3), 036007. [5] Das, N., Vanthornhout, J., Francart, T., & Bertrand, A. (2019). Stimulus-aware spatial filtering for single-trial neural response and temporal response function estimation in high-density EEG with applications in auditory research. NeuroImage 204 (2020) [6] Biesmans, W., Das, N., Francart, T., & Bertrand, A. (2016). Auditory-inspired speech envelope extraction methods for improved EEG-based auditory attention detection in a cocktail party scenario. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 25(5), 402-412. [7] Das, N., Van Eyndhoven, S., Francart, T., & Bertrand, A. (2016). Adaptive attention-driven speech enhancement for EEG-informed hearing prostheses. In Proceedings of the 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), 77-80. [8] Van Eyndhoven, S., Francart, T., & Bertrand, A. (2016). EEG-informed attended speaker extraction from recorded speech mixtures with application in neuro-steered hearing prostheses. IEEE Transactions on Biomedical Engineering, 64(5), 1045-1056. [9] Das, N., Van Eyndhoven, S., Francart, T., & Bertrand, A. (2017). EEG-based Attention-Driven Speech Enhancement For Noisy Speech Mixtures Using N-fold Multi-Channel Wiener Filters. In Proceedings of the 25th European Signal Processing Conference (EUSIPCO), 1660-1664. [10] Narayanan, A. M., & Bertrand, A. (2018). The effect of miniaturization and galvanic separation of EEG sensor devices in an auditory attention detection task. In Proceedings of the 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), 77-80. [11] Vandecappelle , S., Deckers, L., Das, N., Ansari, A. H., Bertrand, A., & Francart, T. (2020). EEG-based detection of the locus of auditory attention with convolutional neural networks. bioRxiv 475673; doi: https://doi.org/10.1101/475673. [12] Narayanan, A. M., & Bertrand, A. (2019). Analysis of Miniaturization Effects and Channel Selection Strategies for EEG Sensor Networks With Application to Auditory Attention Detection. IEEE Transactions on Biomedical Engineering, 67(1), 234-244. [13] Geirnaert, S., Francart, T., & Bertrand, A. (2019). A New Metric to Evaluate Auditory Attention Detection Performance Based on a Markov Chain. In Proceedings of the 27th European Signal Processing Conference (EUSIPCO), 1-5. [14] Geirnaert, S., Francart,T., & Bertrand, A. (2020). An Interpretable Performance Metric for Auditory Attention Decoding Algorithms in a Context of Neuro-Steered Gain Control. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 28(1), 307-317. [15] Geirnaert, S., Francart,T., & Bertrand, A. (2020). Fast EEG-based decoding of the directional focus of auditory attention using common spatial patterns. bioRxiv 2020.06.16.154450; doi: https://doi.org/10.1101/2020.06.16.154450. [16] Geirnaert, S., Vandecappelle, S., Alickovic, E., de Cheveigné, A., Lalor, E., Meyer, B.T., Miran, S., Francart, T., & Bertrand, A. (2020). Neuro-Steered Hearing Devices: Decoding Auditory Attention From the Brain. arXiv 2008.04569; doi: arXiv:2008.04569.

请引用本数据集所依托的原始论文：N. Das、W. Biesmans、A. Bertrand、T. Francart，《头相关滤波与耳特异性解码偏倚对听觉注意检测的影响》，《神经工程期刊》（Journal of Neural Engineering），第13卷，056014，2016年。DOI: 10.1088/1741-2560/13/5/056014 作者团队2023年10月重要更新：本数据集已被众多研究论文广泛使用，成为评估听觉注意解码（auditory attention detection, AAD）新型解码策略的标准基准数据集。我们特此强调，在此类研究中开展严格的交叉验证至关重要。在构建本数据集的原始研究Das等人的工作中，采用了基于线性相关的方法，此时使用简单的随机交叉验证即可满足要求。但随着机器学习技术，尤其是深度神经网络的应用激增，采用更为严格的交叉验证方法已成为必要。深度网络极易对脑电数据中试次特异性的模式产生过拟合，即使是时长不足1秒的短片段也会出现此类问题，进而导致模型能够直接识别试次来源。由于受试者通常会在单个试次中始终专注于同一位说话人，若模型知晓试次信息，即可实现近乎完美的注意解码。我们发现，许多使用本数据集的研究仍采用基础的随机交叉验证方法，未将试次划分为训练集与测试集，因此这类研究在使用极短的脑电片段（1秒或数秒）时，往往会报告极高的AAD准确率。然而已有研究表明，此类方法会得到不准确且过度乐观的结果。当采用去除试次偏倚的标准交叉验证方法（例如留一试次交叉验证、留一故事交叉验证或留一受试者交叉验证）时，准确率通常会大幅下降，有时甚至低于随机猜测水平。这种过拟合效应的相关研究参见：Corentin Puffay等人，《基于深度神经网络的脑电与连续语音关联研究综述》，《神经工程期刊》20，041003，2023年，DOI:10.1088/1741-2552/ace73f。此外，需要注意的是，直接对脑电片段进行分类的AAD策略，而非显式计算解码器输出与对应语音包络的相关性，可能会受到注视偏差的影响。该偏差指受试者会下意识地将视线转向专注的说话人。由于脑电设备可能会无意捕捉到这类注视模式，因此无论研究者是否有意，都可以利用该注视信息提升AAD性能。需要特别说明的是，本数据集未设置任何控制措施以抵消或缓解该注视偏差。这种注视过拟合效应的相关讨论参见：Rotaru等人，《多说话人视听实验中基于脑电解码听觉注意空间焦点的共空间模式方法》，bioRxiv 2023.07.13.548824; doi: https://doi.org/10.1101/2023.07.13.548824 本研究工作由鲁汶大学（KU Leuven）神经科学系ExpORL实验室与电气工程系（ESAT）完成。本数据集包含16名听力正常受试者的脑电（electroencephalogram, EEG）数据。脑电信号采集于鲁汶大学ExpORL实验室的隔音电磁屏蔽室，采用BioSemi ActiveTwo系统以8196 Hz的采样率记录64通道脑电信号。音频信号经4 kHz低通滤波后，以60 dBA的声压级通过Etymotic ER3A插入式耳机传递给受试者。实验采用ExpORL实验室开发的APEX 3程序完成[1]。实验采用4段由不同男性朗读者录制的荷兰语短篇故事[2]作为听觉刺激。音频文件中时长超过500 ms的静音片段均被截断至500 ms，每篇故事被分为两段，每段时长约6分钟。实验过程中，受试者会同时听到两段来自4篇故事中的6分钟片段。实验分为两种刺激条件：`HRTF`（头相关传输函数（Head-Related Transfer Function, HRTF））与`dry`（双耳分听）。单次实验定义为4次刺激呈现的序列：每种刺激条件与刺激耳各对应2次呈现，且每次刺激呈现后均会向受试者提问。所有受试者在单次采集会话中完成3次实验。首次实验包含4次刺激呈现：每次呈现时，受试者被要求专注于一侧耳的故事，忽略另一侧耳的故事。每次呈现结束后，受试者会被问及关于所专注故事的多项选择题，以维持其任务专注度。下一次呈现时，受试者需要切换至专注另一侧耳的故事。以此类推，单次实验包含4次呈现，受试者共聆听两段故事，并在各次呈现间切换注意耳。第二次实验采用相同的设计，但使用另外两篇故事。需注意，不同受试者或不同采集会话的实验设计表格存在差异：表格中的各元素会在不同会话间进行置换，以确保不同条件（刺激条件与注意耳）在4次呈现中均匀分布。第三次实验包含若干次呈现：将首次实验中各次呈现的前2分钟片段（共4段短片段）重复3次，以生成重复刺激的采集数据。因此，每名受试者的脑电总采集时长约为72分钟。本研究将单次刺激呈现对应的脑电数据记为一个试次（trial）。每名受试者共记录20个试次：第1、2次实验各4个试次，第3次实验共12个试次（对应第1次实验4次呈现的前2分钟片段，重复3次）。脑电数据以受试者专属的MATLAB文件格式存储，文件名为`Sx`，其中`x`为受试者编号。音频数据以WAV格式存储于`stimuli`文件夹中。请注意，各故事的时长并不一致，且即使竞争性故事已播放完毕，受试者仍可听完当前专注的故事。因此，建议针对每个试次，以脑电记录的时长为准截断对应音频片段的末尾，以确保处理后的脑电与音频数据仅包含双说话人竞争场景。每个试次数据均经过0.5 Hz截止频率的高通滤波，并从原始采样率8192 Hz降采样至128 Hz。每个试次文件（trial*.mat）包含以下信息： - RawData.Channels：通道编号（1至64） - RawData.EegData：脑电数据（样本数 × 通道数） - FileHeader.SampleRate：保存数据的采样频率 - TrialID：1至20的数字，表示试次编号 - attended_ear：受试者的注意耳，`L`代表左耳，`R`代表右耳 - stimuli：细胞数组，其中stimuli{1}与stimuli{2}分别对应受试者左耳与右耳呈现的音频文件名 - condition：刺激呈现条件：`HRTF`表示刺激经头相关传输函数滤波，模拟说话人位于受试者左侧90°与右侧90°的音频信号；`dry`表示双耳分听呈现，即左右耳机分别单独播放一段故事音频 - experiment：实验编号（1、2或3） - part：当前呈现的故事片段编号（第1、2次实验为1至4，第3次实验为1至12） - attended_track：受试者专注的故事轨道，`1`代表轨道1，`2`代表轨道2。在第1次实验中注意力始终指向轨道1，第2次实验中始终指向轨道2 - repetition：二元变量，标识该试次是否为重复呈现的刺激 - subject：受试者编号，格式为`Sx`，`x`为受试者编号 `stimuli`文件夹中的音频文件格式为：part{片段编号}_track{轨道编号}_{条件}.wav。尽管该文件夹包含经HRTF滤波的刺激音频，但分析时我们默认使用原始干净刺激（即`dry`条件下呈现的刺激），因此仅从part{片段编号}_track{轨道编号}_dry.wav文件中提取听觉包络。 MATLAB脚本`preprocess_data.m`提供了脑电与音频数据同步及预处理的示例代码，具体实现方式参见文献[14]。本数据集依赖AMToolbox工具包。本数据集已被文献[3, 5-16]使用。 [1] Francart, T., Van Wieringen, A., & Wouters, J. (2008). APEX 3：听觉心理物理实验通用测试平台。《神经科学方法期刊》，172(2)，283-293。 [2] Radioboeken voor kinderen, http://radioboeken.eu/kinderradioboeken.php?lang=NL, 2007 (访问时间：2015年3月30日) [3] Das, N., Biesmans, W., Bertrand, A., & Francart, T. (2016). 头相关滤波与耳特异性解码偏倚对听觉注意检测的影响。《神经工程期刊》，13(5)，056014。 [4] Somers, B., Francart, T., & Bertrand, A. (2018). 基于多通道维纳滤波器的通用脑电伪影去除算法。《神经工程期刊》，15(3)，036007。 [5] Das, N., Vanthornhout, J., Francart, T., & Bertrand, A. (2019). 用于高密度脑电单试次神经响应与时间响应函数估计的刺激感知空间滤波及其在听觉研究中的应用。《神经图像》204 (2020) [6] Biesmans, W., Das, N., Francart, T., & Bertrand, A. (2016). 鸡尾酒会场景下基于脑电的听觉注意检测：听觉启发式语音包络提取方法的性能提升。《IEEE神经系统与康复工程汇刊》，25(5)，402-412。 [7] Das, N., Van Eyndhoven, S., Francart, T., & Bertrand, A. (2016). 基于脑电信息的听觉假肢自适应注意驱动语音增强。收录于第38届IEEE工程医学与生物学学会年会（EMBC）论文集，77-80。 [8] Van Eyndhoven, S., Francart, T., & Bertrand, A. (2016). 基于脑电信息的混合录音中专注说话人提取及其在神经驱动听觉假肢中的应用。《IEEE生物医学工程汇刊》，64(5)，1045-1056。 [9] Das, N., Van Eyndhoven, S., Francart, T., & Bertrand, A. (2017). 基于N重多通道维纳滤波器的脑电驱动噪声混合语音注意增强。收录于第25届欧洲信号处理会议（EUSIPCO）论文集，1660-1664。 [10] Narayanan, A. M., & Bertrand, A. (2018). 脑电传感器设备小型化与电流隔离对听觉注意检测任务的影响。收录于第40届IEEE工程医学与生物学学会年会（EMBC）论文集，77-80。 [11] Vandecappelle, S., Deckers, L., Das, N., Ansari, A. H., Bertrand, A., & Francart, T. (2020). 基于卷积神经网络的脑电听觉注意位点检测。bioRxiv 475673; doi: https://doi.org/10.1101/475673。 [12] Narayanan, A. M., & Bertrand, A. (2019). 脑电传感器网络的小型化效应与通道选择策略分析及其在听觉注意检测中的应用。《IEEE生物医学工程汇刊》，67(1)，234-244。 [13] Geirnaert, S., Francart, T., & Bertrand, A. (2019). 基于马尔可夫链的听觉注意检测性能评估新指标。收录于第27届欧洲信号处理会议（EUSIPCO）论文集，1-5。 [14] Geirnaert, S., Francart,T., & Bertrand, A. (2020). 神经驱动增益控制场景下听觉注意解码算法的可解释性能指标。《IEEE神经系统与康复工程汇刊》，28(1)，307-317。 [15] Geirnaert, S., Francart,T., & Bertrand, A. (2020). 基于共空间模式的快速脑电听觉注意方向焦点解码。bioRxiv 2020.06.16.154450; doi: https://doi.org/10.1101/2020.06.16.154450。 [16] Geirnaert, S., Vandecappelle, S., Alickovic, E., de Cheveigné, A., Lalor, E., Meyer, B.T., Miran, S., Francart, T., & Bertrand, A. (2020). 神经驱动听觉设备：从脑电解码听觉注意。arXiv 2008.04569; doi: arXiv:2008.04569。

创建时间：

2023-06-28

搜集汇总

数据集介绍

背景与挑战

背景概述

该数据集是KU Leuven发布的听觉注意力检测数据集的一个旧版本，主要用于基于脑电图（EEG）的听觉注意力检测研究，涉及信号处理和神经科学领域。数据集访问受限，公共版本已更新至其他链接，当前版本为1.1.0，发布于2019年8月30日，并已被多篇学术论文引用。

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集