The acoustic feature dataset of WD patients and healthy individuals
收藏DataCite Commons2025-04-27 更新2025-04-16 收录
下载链接:
https://www.scidb.cn/detail?dataSetId=803eadb589694fa4825d81a764076e29
下载链接
链接失效反馈官方服务:
资源简介:
The study uses a state-of-the-art speech embedding method for WD detection in unstructured connected speech (UCS), combining bi-directional semantic dependencies and attentional mechanisms.The feature data file contains 110 native Mandarin-speaking participants, including 55 WD patients and 55 sex-matched healthy individuals. Four columns of data are labels (0 for healthy individuals and 1 for WD patients), ComParE feature set, Wav2vec 2.0, and HuBERT embedded feature set.To obtain frame-level speech representations that can be compared and fused with embedding approaches, we use only the LLDs of ComParE (the current latest 2016 version), which contains 65-dimensional features per time step, and configure the window length and the step length to 30 ms and 20 ms, respectively. The final ComParE feature shape of each participant's 60s audio is 2999 × 65.For adapting to native speech data, we extract embeddings based on pre-trained models w2v2 and HuBERT fine-tuned on 10,000 hours of Chinese speech data from WenetSpeech, respectively. Furthermore, considering the computational resources and time cost, we choose to use the base version of the pre-trained models, i.e., the final 768-dimensional hidden layer, as the embedding representation of the audio. The last hidden state in the model serves as the embedding representation with a shape of 2999 × 768 for an audio sample.
本研究采用结合双向语义依赖与注意力机制的先进语音嵌入方法,用于非结构化连贯语音(UCS)中的WD检测。特征数据文件包含110名以普通话为母语的参与者,其中包括55名WD患者和55名性别匹配的健康个体。数据包含四列:标签(0代表健康个体,1代表WD患者)、ComParE特征集、Wav2vec 2.0特征集以及HuBERT嵌入特征集。为获取可与嵌入方法进行比较和融合的帧级语音表示,我们仅使用ComParE的LLDs(当前最新的2016版本)——其每个时间步包含65维特征,并将窗口长度和步长分别设置为30毫秒和20毫秒。每位参与者60秒音频的最终ComParE特征形状为2999×65。为适配母语语音数据,我们分别基于在WenetSpeech的10000小时中文语音数据上微调的预训练模型w2v2和HuBERT提取嵌入特征。此外,考虑到计算资源与时间成本,我们选择使用预训练模型的基础版本(即最终的768维隐藏层)作为音频的嵌入表示。模型中的最后一个隐藏状态作为嵌入表示,单个音频样本的特征形状为2999×768。
提供机构:
Science Data Bank
创建时间:
2024-03-15



