SparrKULee: A Speech-evoked Auditory Response Repository of the KU Leuven, containing EEG of 85 participants

Name: SparrKULee: A Speech-evoked Auditory Response Repository of the KU Leuven, containing EEG of 85 participants
Creator: KU Leuven RDR
Published: 2025-12-05 09:02:41
License: 暂无描述

DataCite Commons2025-12-05 更新2024-07-13 收录

下载链接：

https://rdr.kuleuven.be/citation?persistentId=doi:10.48804/K3VSND

下载链接

链接失效反馈

官方服务：

资源简介：

The following author contributed equally to this dataset: Accou, Bernd; Bollens, Lies. For easy access to the data, we recommend using the instruction/access via <a href="https://homes.esat.kuleuven.be/~spchdata/corpora/auditory_eeg_data/">our hosting server</a>. Researchers investigating the neural mechanisms underlying speech perception often employ electroencephalography (EEG) to record brain activity while participants listen to spoken language. The high temporal resolution of EEG enables the study of neural responses to fast and dynamic speech signals. Previous studies have successfully extracted speech characteristics from EEG data and, conversely, predicted EEG activity from speech features. Machine learning techniques are generally employed to construct encoding and decoding models, which necessitate a substantial amount of data. We present SparrKULee: A Speech-evoked Auditory Repository of EEG, measured at KU Leuven, comprising 64-channel EEG recordings from 85 young individuals with normal hearing, each of whom listened to 90-150 minutes of natural speech. This dataset is more extensive than any currently available dataset in terms of both the number of participants and the amount of data per participant. It is suitable for training larger machine learning models. We evaluate the dataset using linear and state-of-the-art non-linear models in a speech encoding/decoding and match/mismatch paradigm, providing benchmark scores for future research. Our <a href="https://github.com/exporl/auditory-eeg-dataset"> github repository</a> contains the necessary code to perform preprocessing steps needed to obtain the files in the derivatives folder, as well as extra code to show the technical validation of our dataset and tools to download the dataset more easily. This link provides a download of the whole dataset in <a href="https://rdr.kuleuven.be/api/access/dataset/:persistentId/?persistentId=doi:10.48804/K3VSND"> one big zip file ( > 100GB) </a>. For a download of the dataset using already zipped files, split up into smaller chunks<a href="https://kuleuven-my.sharepoint.com/:f:/g/personal/lies_bollens_kuleuven_be/EulH76nkcwxIuK--XJhLxKQBaX8_GgAX-rTKK7mskzmAZA?e=N6M5Ll">, click here</a>. Due to privacy concerns, there are some restricted files in the dataset. Users requesting access should send a mail to <a href="mailto:sparrkulee@kuleuven.be">sparrkulee@kuleuven.be </a>, stating what they want to use the data for. Access will be granted to non-commercial users, complying to the CC-BY-NC-4.0 licence

本数据集由以下作者贡献均等：Accou, Bernd；Bollens, Lies。 **为便捷获取本数据集，我们推荐通过我们的托管服务器（https://homes.esat.kuleuven.be/~spchdata/corpora/auditory_eeg_data/）获取数据。** 研究言语感知神经机制的研究者通常采用脑电图（electroencephalography, EEG）记录被试聆听口语时的脑活动。脑电图具备高时间分辨率，可用于探究快速动态言语信号对应的神经响应。既往研究已成功从EEG数据中提取言语特征，反之亦可从言语特征预测EEG活动。机器学习技术通常用于构建编码与解码模型，而这类模型需要大规模数据集作为支撑。我们推出SparrKULee：鲁汶大学（KU Leuven）采集的言语诱发听觉脑电图数据集。该数据集包含85名听力正常青年的64通道EEG记录，每位被试聆听了90至150分钟的自然言语。无论从被试人数还是单被试数据量来看，本数据集均优于当前已公开的同类数据集，适用于训练更大规模的机器学习模型。我们通过线性模型与最先进的非线性模型，在言语编码/解码及匹配/不匹配范式下对本数据集进行了评估，为后续研究提供了基准得分。我们的GitHub仓库（https://github.com/exporl/auditory-eeg-dataset）包含获取衍生文件夹中文件所需的预处理代码，同时提供用于验证数据集技术特性的额外代码，以及更便捷的数据集下载工具。该链接（https://rdr.kuleuven.be/api/access/dataset/:persistentId/?persistentId=doi:10.48804/K3VSND）提供整个数据集的单文件压缩包下载，文件大小超过100GB。若需使用已拆分的小体积压缩包下载数据集，请点击此处（https://kuleuven-my.sharepoint.com/:f:/g/personal/lies_bollens_kuleuven_be/EulH76nkcwxIuK--XJhLxKQBaX8_GgAX-rTKK7mskzmAZA?e=N6M5Ll）。出于隐私保护考虑，数据集中包含部分受限文件。申请获取数据访问权限的用户请发送邮件至sparrkulee@kuleuven.be，并说明数据使用用途。符合CC-BY-NC-4.0许可协议的非商业用途用户将获得访问权限。

提供机构：

KU Leuven RDR

创建时间：

2022-12-15

搜集汇总

数据集介绍

背景与挑战

背景概述

SparrKULee是一个由KU Leuven收集的语音诱发听觉反应EEG数据集，包含85名正常听力年轻人的64通道EEG记录，每人聆听90-150分钟自然语音。该数据集在参与者数量和每人数据量上都超过现有数据集，适合训练大型机器学习模型，并已用于语音编码/解码和匹配/不匹配范式的基准测试。

以上内容由遇见数据集搜集并总结生成