ASCEND

Name: ASCEND
Creator: 香港科技大学
Published: 2022-05-03 12:39:22
License: 暂无描述

arXiv2022-05-03 更新2024-06-21 收录

下载链接：

https://huggingface.co/datasets/CAiRE/ASCEND

下载链接

链接失效反馈

官方服务：

资源简介：

ASCEND是一个高质量的普通话中文-英语代码转换语料库，基于香港收集的自发多轮对话资源。该数据集包含10.62小时的清晰语音，来自23位中英文双语者。数据集创建过程包括从对话中收集语音数据及标注。ASCEND旨在解决现有代码转换语料库多基于朗读而非自发语音的问题，通过自发语音构建模型，以更准确地模拟实际环境中的代码转换现象。

ASCEND is a high-quality Mandarin Chinese-English code-switching corpus built upon spontaneous multi-turn dialogue resources collected in Hong Kong. This corpus contains 10.62 hours of clear speech from 23 bilingual speakers proficient in both Chinese and English. The development process of ASCEND encompasses collecting speech data and corresponding annotations from the collected dialogues. ASCEND is designed to mitigate the limitation that most existing code-switching corpora are constructed based on read speech rather than spontaneous speech. By leveraging spontaneous speech data, this corpus enables the development of models that can more accurately simulate code-switching phenomena in real-world environments.

提供机构：

香港科技大学

创建时间：

2021-12-12

5,000+

优质数据集

54 个

任务类型

进入经典数据集