DataoceanAI/Ten_Thousand_People_Corpus
收藏Hugging Face2024-07-17 更新2024-07-22 收录
下载链接:
https://hf-mirror.com/datasets/DataoceanAI/Ten_Thousand_People_Corpus
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含阅读和自由对话两种类型的中文语音数据,涵盖了新闻、短信、汽车控制、数字序列、音乐、日常生活、地图、日常口语、家庭、健康、旅行、工作、社交、名人、天气等多种常见生活话题。阅读文本部分由10,051人参与录制,总时长为3,953小时,每人至少录制1分钟,每句话至少包含4个字符。自由对话部分由3,844人参与录制,总时长为1,914小时。整个数据集的总时长为5,876小时,涉及13,895名说话者。
This dataset includes two types of Chinese speech data: reading and free conversation, covering various common life topics such as news, text messages, car control, number sequences, music, daily life, maps, daily colloquial speech, family, health, travel, work, socializing, celebrities, and weather. The reading text part involves 10,051 participants with a total duration of 3,953 hours, with each person recording at least 1 minute and each sentence containing at least 4 characters. The free conversation part involves 3,844 participants with a total duration of 1,914 hours. The entire dataset has a total duration of 5,876 hours and involves 13,895 speakers.
提供机构:
DataoceanAI
原始信息汇总
数据集概述
数据集名称
Ten_Thousand_People_Corpus
数据集描述
该数据集包含阅读和对话数据,涵盖新闻、短信、车辆控制、数字序列、音乐、一般话题、地图、日常口语、家庭、健康、旅行、工作、社交、名人、天气等常见生活话题。具体内容包括:
- 阅读文本:10,051人,3,953小时(每人不少于1分钟,每句不少于4个字符)
- 自由对话:3,844人,1,914小时(长音频)
创建者
Dataocean AI
关键词
🇺🇸 Region: US



