DataoceanAI/Ten_Thousand_People_Corpus

Name: DataoceanAI/Ten_Thousand_People_Corpus
Creator: DataoceanAI
Published: 2024-07-17 15:32:56
License: 暂无描述

Hugging Face2024-07-17 更新2024-07-22 收录

下载链接：

https://hf-mirror.com/datasets/DataoceanAI/Ten_Thousand_People_Corpus

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集包含阅读和自由对话两种类型的中文语音数据，涵盖了新闻、短信、汽车控制、数字序列、音乐、日常生活、地图、日常口语、家庭、健康、旅行、工作、社交、名人、天气等多种常见生活话题。阅读文本部分由10,051人参与录制，总时长为3,953小时，每人至少录制1分钟，每句话至少包含4个字符。自由对话部分由3,844人参与录制，总时长为1,914小时。整个数据集的总时长为5,876小时，涉及13,895名说话者。

This dataset includes two types of Chinese speech data: reading and free conversation, covering various common life topics such as news, text messages, car control, number sequences, music, daily life, maps, daily colloquial speech, family, health, travel, work, socializing, celebrities, and weather. The reading text part involves 10,051 participants with a total duration of 3,953 hours, with each person recording at least 1 minute and each sentence containing at least 4 characters. The free conversation part involves 3,844 participants with a total duration of 1,914 hours. The entire dataset has a total duration of 5,876 hours and involves 13,895 speakers.

提供机构：

DataoceanAI

原始信息汇总