MERLIon CCS Challenge dataset

Name: MERLIon CCS Challenge dataset
Creator: 南洋理工大学
Published: 2023-05-30 17:26:20
License: 暂无描述

arXiv2023-05-30 更新2024-06-21 收录

下载链接：

https://github.com/MERLIon-Challenge/merlion-ccs-2023

下载链接

链接失效反馈

官方服务：

资源简介：

MERLIon CCS Challenge数据集是由南洋理工大学和约翰霍普金斯大学合作创建的，专注于英语-普通话代码转换的儿童导向语音数据集。该数据集包含超过30小时的Zoom视频通话录音，总计305个录音，涵盖了家庭环境中的自发和野外英语-普通话代码转换。数据集的创建过程涉及家长通过Zoom软件向孩子讲述无字图画书，录音由多语种转录员使用高保真语言转录协议进行标注。该数据集主要用于语言识别和语言分割的研究，旨在解决不同语言环境和语音模式下的技术挑战。

The MERLIon CCS Challenge Dataset was co-created by Nanyang Technological University and Johns Hopkins University. It is a child-directed speech dataset focusing on English-Mandarin code-switching. The dataset contains over 30 hours of Zoom video call recordings, totaling 305 audio recordings, covering spontaneous and in-the-wild English-Mandarin code-switching speech in home environments. During the dataset development process, parents read wordless picture books aloud to their children via the Zoom software, and the recordings were annotated by multilingual transcribers using a high-fidelity linguistic transcription protocol. This dataset is primarily used for research on language identification and language segmentation, aiming to address technical challenges across diverse language environments and speech patterns.

提供机构：

南洋理工大学

创建时间：

2023-05-30

5,000+

优质数据集

54 个

任务类型

进入经典数据集