Speech Enhancement Dataset
收藏www.kaggle.com2024-05-14 更新2025-01-21 收录
下载链接:
https://www.kaggle.com/dongma878/clearspeech
下载链接
链接失效反馈官方服务:
资源简介:
### About This Dataset
1. This dataset was created in a recent paper 'ClearSpeech: Improving Voice Quality of Earbuds Using Both In-Ear and Out-Ear Microphones' published at ACM IMWUT 2023. The core idea of this paper is to use the speech signals captured by the in-ear and out-ear microphones on earbuds for speech enhancement. For more details, please refer to our paper. To use the dataset, please cite our paper with the following code:
@article{ma2024clearspeech,
title={ClearSpeech: Improving Voice Quality of Earbuds Using Both In-Ear and Out-Ear Microphones},
author={Ma, Dong and Dang, Ting and Ding, Ming and Balan, Rajesh},
journal={Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies},
volume={7},
number={4},
pages={1--25},
year={2024},
publisher={ACM New York, NY, USA}
}
2. This dataset consists of two sub-datasets noise dataset and speech dataset. The noise dataset was recorded when the participants listened to nine various noises without speaking. The speech dataset was recorded when the participants were instructed to read out 300 sentences in a quiet environment.
3. 20 subjects participated in the data collection.
4. The in-ear and out-ear signals from both ears are recorded and encoded as four channels (channel 1 for the in-ear mic of the left earbud, channel 2 for the out-ear mic of the left earbud, channel 3 for the in-ear mic of the right earbud, channel 4 for the out-ear mic of the right earbud). Each channel is arranged in a separate folder.
5. The original sampling rate was 48KHz. Given that the energy of human voice resides in frequencies lower than 8000Hz, the released dataset was downsampled to 16KHz.
### File Structure and Format
1. There is a readme file, a corpus of the sentences read by the participants (300 sentences), and 20 sub-folders each for one participant.
2. In each participant folder, there are four sub-folders for each channel (Ch1 for the in-ear mic of the left earbud, Ch2 for the out-ear mic of the left earbud, Ch3 for the in-ear mic of the right earbud, Ch4 for the out-ear mic of the right earbud)
3. In each channel folder, there are 309 WAV files, including 9 noise files and 300 speech files (the sentences are segmented)
4. File naming examples:
(1) 'S1\Ch1\Noise_1', 'S1' means subject 1, 'Ch1' means in-ear mic of the left earbud, 'Noise_1' means the noise type 1.
(2) 'S2\Ch2\Speech_1', 'S1' means subject 1, 'Ch2' means out-ear mic of the left earbud, 'Speech_1' means the first sentence.
### Acknowledgement
This research was supported by the Singapore Ministry of Education (MOE) Academic Research Fund (AcRF) Tier 1 grant (Grant ID: 21-SIS-SMU-036).
### 关于本数据集
1. 本数据集源于近期发表于ACM IMWUT 2023会议上的论文《ClearSpeech:利用耳塞内置和外置麦克风提升语音质量》,其核心思想在于利用耳塞内置和外置麦克风捕捉的语音信号进行语音增强。欲获取更详细信息,请参阅我们的论文。为使用本数据集,请按照以下代码引用我们的论文:
@article{ma2024clearspeech,
title={ClearSpeech:利用耳塞内置和外置麦克风提升语音质量},
author={Ma, Dong and Dang, Ting and Ding, Ming and Balan, Rajesh},
journal={《ACM互动、移动、可穿戴及泛在技术会议论文集》},
volume={7},
number={4},
pages={1--25},
year={2024},
publisher={ACM纽约,纽约,美国}
}
2. 本数据集包含两个子数据集:噪声数据集和语音数据集。噪声数据集记录了参与者在未发声聆听九种不同噪音时的声音,而语音数据集则记录了参与者在安静环境中朗读300句句子时的声音。
3. 20位参与者参与了数据收集。
4. 记录并编码了双耳的内置和外置信号,共计四通道(通道1为左侧耳塞内置麦克风,通道2为左侧耳塞外置麦克风,通道3为右侧耳塞内置麦克风,通道4为右侧耳塞外置麦克风)。每个通道的数据分别存放在独立的文件夹中。
5. 原始采样率为48KHz。鉴于人类语音的能量主要集中在低于8000Hz的频率范围内,因此发布的数据集已降采样至16KHz。
### 文件结构和格式
1. 包含一个readme文件、参与者的句子语料库(300句句子)以及每个参与者的20个子文件夹。
2. 在每个参与者文件夹中,包含四个子文件夹,分别对应四个通道(Ch1为左侧耳塞内置麦克风,Ch2为左侧耳塞外置麦克风,Ch3为右侧耳塞内置麦克风,Ch4为右侧耳塞外置麦克风)。
3. 在每个通道文件夹中,包含309个WAV文件,包括9个噪声文件和300个语音文件(句子已分割)。
4. 文件命名示例:
(1) 'S1Ch1Noise_1',其中'S1'代表参与者1,'Ch1'代表左侧耳塞内置麦克风,'Noise_1'代表第1种噪音类型。
(2) 'S2Ch2Speech_1',其中'S1'代表参与者1,'Ch2'代表左侧耳塞外置麦克风,'Speech_1'代表第1句句子。
### 致谢
本研究得到新加坡教育部(MOE)学术研究基金(AcRF)一级资助(项目编号:21-SIS-SMU-036)的支持。
提供机构:
Kaggle



