five

The acoustic feature dataset of WD patients and healthy individuals

收藏
科学数据银行2024-03-15 更新2026-04-23 收录
下载链接:
https://www.scidb.cn/detail?dataSetId=803eadb589694fa4825d81a764076e29
下载链接
链接失效反馈
官方服务:
资源简介:
The study uses a state-of-the-art speech embedding method for WD detection in unstructured connected speech (UCS), combining bi-directional semantic dependencies and attentional mechanisms.The feature data file contains 110 native Mandarin-speaking participants, including 55 WD patients and 55 sex-matched healthy individuals. Four columns of data are labels (0 for healthy individuals and 1 for WD patients), ComParE feature set, Wav2vec 2.0, and HuBERT embedded feature set.To obtain frame-level speech representations that can be compared and fused with embedding approaches, we use only the LLDs of ComParE (the current latest 2016 version), which contains 65-dimensional features per time step, and configure the window length and the step length to 30 ms and 20 ms, respectively. The final ComParE feature shape of each participant's 60s audio is 2999 × 65.For adapting to native speech data, we extract embeddings based on pre-trained models w2v2 and HuBERT fine-tuned on 10,000 hours of Chinese speech data from WenetSpeech, respectively. Furthermore, considering the computational resources and time cost, we choose to use the base version of the pre-trained models, i.e., the final 768-dimensional hidden layer, as the embedding representation of the audio. The last hidden state in the model serves as the embedding representation with a shape of 2999 × 768 for an audio sample.
提供机构:
Zhenglin Zhang
创建时间:
2023-09-14
二维码
社区交流群
二维码
科研交流群
商业服务