SpeechBSD Dataset
收藏数据集概述
数据集名称
- SpeechBSD Dataset
数据集描述
- 该数据集是对BSD corpus的扩展,包含音频文件和说话者属性信息。
数据集下载
- 通过
git clone和wget命令从GitHub仓库下载。 - 也可通过huggingface下载。
数据集统计
| Train | Dev. | Test | |
|---|---|---|---|
| Scenarios | 670 | 69 | 69 |
| Sentences | 20,000 | 2,051 | 2,120 |
| En audio (h) | 20.1 | 2.1 | 2.1 |
| Ja audio (h) | 25.3 | 2.7 | 2.7 |
| En audio gender (male % / female %) | 47.2 / 52.8 | 50.1 / 49.9 | 44.4 / 55.6 |
| Ja audio gender (male % / female %) | 68.0 / 32.0 | 62.3 / 37.7 | 69.0 / 31.0 |
数据集结构
wav目录包含16 kHz、单声道的wav文件,分为train、dev、test。txt目录包含json文件,同样分为train、dev、test。- 每个json文件包含多个场景。
- 每个场景包含:
id,tag,title,original_language,conversationconversation包含多个话语,每个话语包含:no,ja_speaker,en_speaker,ja_sentence,en_sentenceja_spkid,en_spkid(说话者ID)ja_wav,en_wav(wav文件名)ja_spk_gender,en_spk_gender(说话者性别)ja_spk_prefecture,en_spk_state(说话者来源地)
数据集注意事项
- 性别标识为"M"或"F"。
- 说话者ID不同表示不同说话者,但可能存在同一说话者使用不同ID的情况。
- 音频性别信息可能与文本推断的性别不符。
- 日语说话者来自日本,英语说话者来自美国。
数据集引用
-
引用格式:
@inproceedings{shimizu-etal-2023-towards, title = "Towards Speech Dialogue Translation Mediating Speakers of Different Languages", author = "Shimizu, Shuichiro and Chu, Chenhui and Li, Sheng and Kurohashi, Sadao", booktitle = "Findings of the Association for Computational Linguistics: ACL 2023", month = jul, year = "2023", address = "Toronto, Canada", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2023.findings-acl.72", pages = "1122--1134", abstract = "We present a new task, speech dialogue translation mediating speakers of different languages. We construct the SpeechBSD dataset for the task and conduct baseline experiments. Furthermore, we consider context to be an important aspect that needs to be addressed in this task and propose two ways of utilizing context, namely monolingual context and bilingual context. We conduct cascaded speech translation experiments using Whisper and mBART, and show that bilingual context performs better in our settings.", }
数据集许可证
- 该数据集遵循CC-BY-NC-SA 4.0许可证。




