Nexdata/accented_english

Name: Nexdata/accented_english
Creator: Nexdata
Published: 2023-11-22 09:51:18
License: 暂无描述

Hugging Face2023-11-22 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/Nexdata/accented_english

下载链接

链接失效反馈

官方服务：

资源简介：

--- YAML tags: - copy-paste the tags obtained with the tagging app: https://github.com/huggingface/datasets-tagging task_categories: - automatic-speech-recognition language: - en --- # Dataset Card for accented-english ## Table of Contents - [Table of Contents](#table-of-contents) - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-fields) - [Data Splits](#data-splits) - [Dataset Creation](#dataset-creation) - [Curation Rationale](#curation-rationale) - [Source Data](#source-data) - [Annotations](#annotations) - [Personal and Sensitive Information](#personal-and-sensitive-information) - [Considerations for Using the Data](#considerations-for-using-the-data) - [Social Impact of Dataset](#social-impact-of-dataset) - [Discussion of Biases](#discussion-of-biases) - [Other Known Limitations](#other-known-limitations) - [Additional Information](#additional-information) - [Dataset Curators](#dataset-curators) - [Licensing Information](#licensing-information) ## Dataset Description - **Homepage:** https://nexdata.ai/?source=Huggingface - **Repository:** - **Paper:** - **Leaderboard:** - **Point of Contact:** ### Dataset Summary The dataset contains 20,000 hours of accented English speech data. It's collected from local English speakers in more than 20 countries, such as USA, China, UK, Germany, Japan, India, France, Spain, Russia, Latin America, covering a variety of pronunciation habits and characteristics, accent severity, and the distribution of speakers. The format is 16kHz, 16bit, uncompressed wav, mono channel. The sentence accuracy is over 95%. For more details, please refer to the link: https://nexdata.ai/speechRecognition?source=Huggingface ### Supported Tasks and Leaderboards automatic-speech-recognition, audio-speaker-identification: The dataset can be used to train a model for Automatic Speech Recognition (ASR). ### Languages English ## Dataset Structure ### Data Instances [More Information Needed] ### Data Fields [More Information Needed] ### Data Splits [More Information Needed] ## Dataset Creation ### Curation Rationale [More Information Needed] ### Source Data #### Initial Data Collection and Normalization [More Information Needed] #### Who are the source language producers? [More Information Needed] ### Annotations #### Annotation process [More Information Needed] #### Who are the annotators? [More Information Needed] ### Personal and Sensitive Information [More Information Needed] ## Considerations for Using the Data ### Social Impact of Dataset [More Information Needed] ### Discussion of Biases [More Information Needed] ### Other Known Limitations [More Information Needed] ## Additional Information ### Dataset Curators [More Information Needed] ### Licensing Information Commercial License

YAML 标签： - 复制粘贴通过标记工具获取的标签：https://github.com/huggingface/datasets-tagging 任务类别： - 自动语音识别（automatic-speech-recognition）语言： - 英语（en） # 数据集卡片：accented-english ## 目录 - [目录](#table-of-contents) - [数据集概述](#dataset-description) - [数据集摘要](#dataset-summary) - [支持任务与评测榜单](#supported-tasks-and-leaderboards) - [语言](#languages) - [数据集结构](#dataset-structure) - [数据实例](#data-instances) - [数据字段](#data-fields) - [数据划分](#data-splits) - [数据集构建](#dataset-creation) - [构建初衷](#curation-rationale) - [源数据](#source-data) - [标注信息](#annotations) - [个人与敏感信息](#personal-and-sensitive-information) - [数据使用注意事项](#considerations-for-using-the-data) - [数据集的社会影响](#social-impact-of-dataset) - [偏见讨论](#discussion-of-biases) - [其他已知局限性](#other-known-limitations) - [附加信息](#additional-information) - [数据集维护者](#dataset-curators) - [许可信息](#licensing-information) ## 数据集概述 - **主页**：https://nexdata.ai/?source=Huggingface - **代码仓库**： - **相关论文**： - **评测榜单**： - **联系方式**： ### 数据集摘要本数据集包含20000小时的带口音英语语音数据。其采集自全球20余个国家的本土英语使用者，涵盖美国、中国、英国、德国、日本、印度、法国、西班牙、俄罗斯、拉丁美洲等地区，覆盖多样化的发音习惯与特征、口音轻重程度以及说话者分布。数据格式为16kHz、16bit位深的非压缩单声道WAV音频。句子识别准确率超过95%。如需了解更多细节，请访问链接：https://nexdata.ai/speechRecognition?source=Huggingface ### 支持任务与评测榜单自动语音识别（automatic-speech-recognition）、音频说话人识别（audio-speaker-identification）：该数据集可用于训练自动语音识别（Automatic Speech Recognition，简称ASR）模型。 ### 语言英语（English） ## 数据集结构 ### 数据实例 [需补充更多信息] ### 数据字段 [需补充更多信息] ### 数据划分 [需补充更多信息] ## 数据集构建 ### 构建初衷 [需补充更多信息] ### 源数据 #### 初始数据采集与归一化 [需补充更多信息] #### 源语言发声者是谁？ [需补充更多信息] ### 标注信息 #### 标注流程 [需补充更多信息] #### 标注者是谁？ [需补充更多信息] ### 个人与敏感信息 [需补充更多信息] ## 数据使用注意事项 ### 数据集的社会影响 [需补充更多信息] ### 偏见讨论 [需补充更多信息] ### 其他已知局限性 [需补充更多信息] ## 附加信息 ### 数据集维护者 [需补充更多信息] ### 许可信息商业许可

提供机构：

Nexdata

原始信息汇总

数据集概述

名称: accented-english
语言: 英语
任务类别: 自动语音识别, 音频说话人识别
数据集大小: 包含20,000小时的带口音英语语音数据
数据格式: 16kHz, 16bit, 未压缩wav, 单声道
句子准确率: 超过95%
数据来源: 超过20个国家的本地英语说话者
许可证: 商业许可证

数据集详细信息

数据集描述

概述: 该数据集收集了来自美国、中国、英国、德国、日本、印度、法国、西班牙、俄罗斯、拉丁美洲等20多个国家的本地英语说话者的语音数据，涵盖了多种发音习惯和特征、口音严重程度以及说话者的分布。

支持的任务和排行榜

任务: 自动语音识别 (ASR)
应用: 用于训练自动语音识别模型

数据集结构

数据实例: [信息待补充]
数据字段: [信息待补充]
数据分割: [信息待补充]

数据集创建

精选理由: [信息待补充]
源数据: [信息待补充]
注释: [信息待补充]
个人和敏感信息: [信息待补充]

使用数据的考虑

社会影响: [信息待补充]
偏见讨论: [信息待补充]
其他已知限制: [信息待补充]

附加信息

数据集管理者: [信息待补充]

搜集汇总

数据集介绍

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集