数据堂—1,535小时中英混读手机采集语音数据

Name: 数据堂—1,535小时中英混读手机采集语音数据
Creator: maas
Published: 2025-10-24 10:31:52
License: 暂无描述

魔搭社区2025-10-24 更新2024-05-15 收录

下载链接：

https://modelscope.cn/datasets/DatatangBeijing/1535Hours-MixedSpeechWithChineseAndEnglishDataByMobilePhone

下载链接

链接失效反馈

官方服务：

资源简介：

1,535小时中英混读手机采集语音数据由3972名中国本土人员参与录制，口音覆盖七大方言区。录音文本均为中英混合句子，涵盖通用场景及人机交互场景，内容丰富，转写精准。可用于改善语音识别系统对中英混读语音的识别效果

This dataset consists of 1,535 hours of mobile-collected code-switching Mandarin-English speech data, recorded by 3,972 native Chinese individuals. The speech samples cover accents from seven major Chinese dialect regions. All accompanying transcripts are mixed Mandarin-English sentences, encompassing both general daily scenarios and human-computer interaction (HCI) scenarios, featuring rich content and precise transcriptions. This dataset can be utilized to improve the speech recognition performance of systems designed for code-switching Mandarin-English speech.

提供机构：

maas

创建时间：

2024-05-06

搜集汇总

数据集介绍