five

中国移动中文方言翻译音频数据集

收藏
国家数据集管理服务平台2026-05-28 更新2026-04-29 收录
下载链接:
https://www.ndsms.cn/dataRetrieval/datasetDetail/?id=454eb2cd8b9600cd5ca9eb0b36149e19
下载链接
链接失效反馈
官方服务:
资源简介:
本数据集为中文方言端到端翻译专属真人语音数据集,内容覆盖休闲娱乐、衣食住行、教育医疗、人际关系、艺术美术、家庭生活、职业发展、体育竞技、政治法律、科学技术、人文科学、数码产品、商业经济、个人特质、气候环境、军事战争等全领域语音对话及朗读数据。全部语音数据均为真人原声采集,各类方言数据均采集于对应方言属地,发音人均为母语使用者,内容符合当地用字用词习惯,完整覆盖日常高频用词与标准方言发音;音频采样率不低于16kHz,比特率为16bit,同步配套多发言人多场景语料。数据集提供中文方言翻译成普通话的翻译语料,包含方言音频、方言标注文本、对应普通话翻译文本、音频和说话人的性别年龄等相关信息。

This dataset is a dedicated real-human speech dataset tailored for end-to-end translation of Chinese dialects. Its content covers speech dialogue and reading data across all fields including leisure and entertainment, basic necessities of life (food, clothing, shelter and transportation), education and healthcare, interpersonal relationships, art and fine arts, family life, career development, sports competitions, politics and law, science and technology, humanities, digital products, business and economy, personal traits, climate and environment, as well as military affairs and warfare. All speech data is collected from original voices of real human speakers, with all dialect data gathered in their respective dialect-speaking regions. The speakers are all native users of the corresponding dialects, and the content conforms to local wording and usage habits, comprehensively covering high-frequency daily vocabulary and standard dialect pronunciations. The audio sampling rate is no less than 16kHz, with a bit depth of 16 bits, and it comes with synchronized multi-speaker and multi-scenario corpus. The dataset provides translation corpus for translating Chinese dialects into Standard Mandarin, including dialect audio, dialect annotated transcripts, corresponding Mandarin translation texts, as well as relevant information such as the gender and age of the speaker associated with each audio clip.
提供机构:
中移九天人工智能科技(北京)有限公司
创建时间:
2026-04-25
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
该数据集是一个中文方言端到端翻译的真人语音数据集,覆盖休闲娱乐、教育医疗等多个领域的语音对话和朗读内容。所有音频均为真人原声采集,采样率不低于16kHz、比特率16bit,并提供方言音频、标注文本、普通话翻译及说话人信息等配套语料。
以上内容由遇见数据集搜集并总结生成
二维码
社区交流群
二维码
科研交流群
商业服务