dinner-party-corpus
收藏魔搭社区2025-12-05 更新2025-03-22 收录
下载链接:
https://modelscope.cn/datasets/benjamin-paine/dinner-party-corpus
下载链接
链接失效反馈官方服务:
资源简介:
This repository contains a reorganized, utterance-focused version of the Dinner Party Corpus, released by Amazon, the Center for Language and Speech Processing (CLSP) and Johns Hopkins University in September 2019.
# Description
The following description is provided in [arXiv 1909.13447](https://arxiv.org/abs/1909.13447):
*We present a speech data corpus that simulates a "dinner party" scenario taking place in an everyday home environment. The corpus was created by recording multiple groups of four Amazon employee volunteers having a natural conversation in English around a dining table. The participants were recorded by a single-channel close-talk microphone and by five far-field 7-microphone array devices positioned at different locations in the recording room. The dataset contains the audio recordings and human labeled transcripts of a total of 10 sessions with a duration between 15 and 45 minutes. The corpus was created to advance in the field of noise robust and distant speech processing and is intended to serve as a public research and benchmarking data set.*
## License
As stated in the paper linked above, section 4, the dataset is released under the [CDLA-Permissive](https://cdla.io/permissive-1-0) license.
## Authors
Van Segbroeck, Maarten; Zaid, Ahmed; Kutsenko, Ksenia; Huerta, Cirenia; Nguyen, Tinh; Luo, Xuewen; Hoffmeister, Björn; Trmal, Jan; Omologo, Maurizio; Maas, Roland
### Contact Persons
Maas, Roland; Hoffmeister, Björn
## Comparison to Base Dataset
- The base dataset was downloaded from [Zenodo](https://zenodo.org/records/8122551), this has a **compressed size** of 12.4GB, and an uncompressed size of 23GB. It is organized in manner to minimize file size and data repetition, with uncut audio and separate label files.
- This dataset has an uncompressed size of **27GB**, making it about 15% larger than the uncompressed base dataset. For this size exchange, you gain ease-of-use; all audio is pre-cut to the start and end utterances, and mapped with the appropriate labels directly in Parquet.
# How to Use
This repository is made to be used with [🤗Datasets](https://huggingface.co/docs/datasets/v2.21.0/index).
```py
from datasets import load_dataset
dataset = load_dataset(
"benjamin-paine/dinner-party-corpus",
config_name="split-channel", # 'split-channel' or 'mixed-channel'
split="train" # 'train' or 'test'
)
for datum in dataset:
# Do something with the audio
# datum["audio"]["array"] is the sample waveform at 16khz (see datum["audio"]["sampling_rate"])
pass
```
## Conversion Script
The script used to convert the data is available in this repository as [convert.py](https://huggingface.co/datasets/benjamin-paine/dinner-party-corpus/blob/main/convert.py).
# Citation
```
@misc{vansegbroeck2019dipcodinnerparty,
title={DiPCo -- Dinner Party Corpus},
author={Maarten Van Segbroeck and Ahmed Zaid and Ksenia Kutsenko and Cirenia Huerta and Tinh Nguyen and Xuewen Luo and Björn Hoffmeister and Jan Trmal and Maurizio Omologo and Roland Maas},
year={2019},
eprint={1909.13447},
archivePrefix={arXiv},
primaryClass={eess.AS},
url={https://arxiv.org/abs/1909.13447},
}
```
本仓库包含经重新整理、以话语单元为核心的晚宴语料库(Dinner Party Corpus)版本,该语料库由亚马逊(Amazon)、语言与语音处理中心(Center for Language and Speech Processing, CLSP)以及约翰斯·霍普金斯大学于2019年9月发布。
# 数据集说明
下述说明源自arXiv预印本编号1909.13447的论文,链接:https://arxiv.org/abs/1909.13447:
我们发布了一款模拟日常家庭环境中“晚宴”场景的语音数据集。本语料库通过录制多组各由四名亚马逊员工志愿者组成的小组围坐在餐桌旁进行的自然英语对话构建而成。录制设备包括单声道近距麦克风,以及安装在录制房间不同位置的五台远场7麦克风阵列设备。本数据集共包含10段时长介于15至45分钟的会话,附带音频录音与人工标注的转写文本。该语料库旨在推动噪声鲁棒与远场语音处理领域的发展,可作为公开的研究与基准测试数据集使用。
# 许可证
如上述链接论文第4节所述,本数据集采用CDLA-Permissive许可证发布,链接:https://cdla.io/permissive-1-0。
# 作者
马滕·范塞布罗克(Maarten Van Segbroeck);艾哈迈德·扎伊德(Ahmed Zaid);克谢尼娅·库森科(Ksenia Kutsenko);西里内亚·韦尔塔(Cirenia Huerta);廷·阮(Tinh Nguyen);罗雪文(Xuewen Luo);比约恩·霍夫迈斯特(Björn Hoffmeister);扬·特马尔(Jan Trmal);毛里齐奥·奥莫洛戈(Maurizio Omologo);罗兰·马斯(Roland Maas)
# 联系人
罗兰·马斯(Roland Maas);比约恩·霍夫迈斯特(Björn Hoffmeister)
# 与基础数据集的对比
- 基础数据集可从Zenodo下载,链接:https://zenodo.org/records/8122551,其压缩后大小为12.4GB,未压缩大小为23GB。该数据集的组织方式以最小化文件体积与数据重复为目标,包含未切割的音频与独立的标注文件。
- 本数据集未压缩大小为27GB,比基础数据集未压缩体积大约15%。作为该体积增量的交换,本数据集提升了易用性:所有音频已预先切割至对应话语单元的起止位置,并通过Parquet格式直接与对应的标注信息进行映射。
# 使用方法
本仓库需配合🤗Datasets(Hugging Face数据集库)使用,官方文档链接:https://huggingface.co/docs/datasets/v2.21.0/index。
python
from datasets import load_dataset
dataset = load_dataset(
"benjamin-paine/dinner-party-corpus",
config_name="split-channel", # 可选配置:'split-channel'(分通道)或'mixed-channel'(混合通道)
split="train" # 可选划分:'train'(训练集)或'test'(测试集)
)
for datum in dataset:
# 对音频数据进行处理
# datum["audio"]["array"] 为16kHz采样率的原始波形(采样率可通过 datum["audio"]["sampling_rate"] 获取)
pass
# 转换脚本
用于转换原始数据的脚本可在本仓库中获取,文件名为convert.py,链接:https://huggingface.co/datasets/benjamin-paine/dinner-party-corpus/blob/main/convert.py。
# 引用信息
bibtex
@misc{vansegbroeck2019dipcodinnerparty,
title={DiPCo -- Dinner Party Corpus},
author={Maarten Van Segbroeck and Ahmed Zaid and Ksenia Kutsenko and Cirenia Huerta and Tinh Nguyen and Xuewen Luo and Björn Hoffmeister and Jan Trmal and Maurizio Omologo and Roland Maas},
year={2019},
eprint={1909.13447},
archivePrefix={arXiv},
primaryClass={eess.AS},
url={https://arxiv.org/abs/1909.13447},
}
提供机构:
maas
创建时间:
2025-03-18



