3M-CPSEED：An EEG-based Dataset for Chinese Pinyin Production in Overt, Silent-intended, and Imagined Speech

OpenNeuro2025-07-12 更新2026-03-14 收录

下载链接：

https://openneuro.org/datasets/ds006465

下载链接

链接失效反馈

官方服务：

资源简介：

Overview This dataset, named 3M-CPSEED, consists of electroencephalogram (EEG) recordings obtained from 20 participants engaged in imagined speech tasks. 3M-CPSEED holds significant implications for speech neurophysiology research, not only facilitating exploration of neural activity differences across pinyin articulations but also enabling robust transfer learning studies for other alphabetic languages. Data Collection Participants: 20 healthy, right-handed individuals (average age: 24.55 years, standard deviation: 2.58 years; 11 females, 9 males) who are native Chinese speakers. Materials: To strike a balance between comprehensively capturing the articulatory features of the Chinese phonological system and maintaining a concise, controllable set of stimuli, we selected this set of Pinyin sounds: Finals: "a, i, u, ü"; Initials: "m, f, j, l, k, ch". Procedure: Participants read Pinyin displayed on a screen at 'speak', 'Silently articulated' and 'imagined' phase. Each participant completed 4 blocks of 1600 trials in total. Data Structure The dataset is organized according to the BIDS standard: Main Folder: dataset_description.json: Description of the dataset. participants.tsv: Participant information. participants.json: Details of columns in participants.tsv. README: General information about the dataset. data_all.mat: Labeled EEG data of all subjects in MAT format. Derivative Data: preproc/: Preprocessed data, including subfolders for each subject (sub-01, etc.), with data in .mat formats . Acknowledgments This work was supported by a 1.3.5 project for disciplines of excellence from West China Hospital (#ZYYC22001).

概述本数据集命名为3M-CPSEED，包含20名参与想象语音任务的受试者的脑电图（electroencephalogram，EEG）记录。3M-CPSEED在语音神经生理学研究中具有重要价值，不仅可助力探索汉语拼音发音对应的神经活动差异，还能为其他拼音文字语言的迁移学习研究提供可靠支撑。数据采集受试者：20名健康右利手汉语母语者，平均年龄24.55岁，年龄标准差为2.58岁，其中女性11名、男性9名。实验材料：为在全面覆盖汉语语音系统的发音特征与保持刺激集简洁可控之间取得平衡，本研究选取了如下拼音音素：韵母："a、i、u、ü"；声母："m、f、j、l、k、ch"。实验流程：受试者需在"出声朗读""默读发音"及"想象发音"三个阶段，对屏幕上呈现的拼音进行对应操作。每名受试者总计完成4个区块，每个区块含1600次试次。数据结构本数据集按照脑成像数据结构（Brain Imaging Data Structure，简称BIDS）标准进行组织：主文件夹： dataset_description.json：数据集描述文件 participants.tsv：受试者信息文件 participants.json：participants.tsv的列详情文件 README：数据集通用说明文件 data_all.mat：以MAT格式存储的所有受试者的标注脑电图数据衍生数据： preproc/：预处理数据文件夹，包含每个受试者的子文件夹（如sub-01等），数据格式为.mat文件。致谢本研究得到了华西医院学科卓越发展1.3.5工程项目（编号：ZYYC22001）的资助。

创建时间：

2025-07-12

搜集汇总

数据集介绍

背景与挑战

背景概述

3M-CPSEED是一个包含20名参与者EEG记录的数据集，专注于汉语拼音在显性、隐性意图和想象语音任务中的神经活动研究。数据集结构符合BIDS标准，支持语音神经生理学和跨语言迁移学习的研究。

以上内容由遇见数据集搜集并总结生成