Nexdata/Shanghai_Dialect_Speech_Data_by_Mobile_Phone

Name: Nexdata/Shanghai_Dialect_Speech_Data_by_Mobile_Phone
Creator: Nexdata
Published: 2024-04-17 06:30:44
License: 暂无描述

Hugging Face2024-04-17 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/Nexdata/Shanghai_Dialect_Speech_Data_by_Mobile_Phone

下载链接

链接失效反馈

官方服务：

资源简介：

--- YAML tags: - copy-paste the tags obtained with the tagging app: https://github.com/huggingface/datasets-tagging --- # Dataset Card for Nexdata/Shanghai_Dialect_Speech_Data_by_Mobile_Phone ## Table of Contents - [Table of Contents](#table-of-contents) - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-fields) - [Data Splits](#data-splits) - [Dataset Creation](#dataset-creation) - [Curation Rationale](#curation-rationale) - [Source Data](#source-data) - [Annotations](#annotations) - [Personal and Sensitive Information](#personal-and-sensitive-information) - [Considerations for Using the Data](#considerations-for-using-the-data) - [Social Impact of Dataset](#social-impact-of-dataset) - [Discussion of Biases](#discussion-of-biases) - [Other Known Limitations](#other-known-limitations) - [Additional Information](#additional-information) - [Dataset Curators](#dataset-curators) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) - [Contributions](#contributions) ## Dataset Description - **Homepage:** https://www.nexdata.ai/datasets/56?source=Huggingface - **Repository:** - **Paper:** - **Leaderboard:** - **Point of Contact:** ### Dataset Summary It collects 2.956 speakers from Shanghai and is recorded in quiet indoor environment. The recorded content includes multi-domain customer consultation, short messages, numbers, Shanghai POI, etc. The corpus has no repetition and the average sentence length is 12.68 words. Recording devices are mainstream Android phones and iPhones. For more details, please refer to the link: https://www.nexdata.ai/datasets/56?source=Huggingface ### Supported Tasks and Leaderboards automatic-speech-recognition, audio-speaker-identification: The dataset can be used to train a model for Automatic Speech Recognition (ASR). ### Languages Shanghai Dialect ## Dataset Structure ### Data Instances [More Information Needed] ### Data Fields [More Information Needed] ### Data Splits [More Information Needed] ## Dataset Creation ### Curation Rationale [More Information Needed] ### Source Data #### Initial Data Collection and Normalization [More Information Needed] #### Who are the source language producers? [More Information Needed] ### Annotations #### Annotation process [More Information Needed] #### Who are the annotators? [More Information Needed] ### Personal and Sensitive Information [More Information Needed] ## Considerations for Using the Data ### Social Impact of Dataset [More Information Needed] ### Discussion of Biases [More Information Needed] ### Other Known Limitations [More Information Needed] ## Additional Information ### Dataset Curators [More Information Needed] ### Licensing Information Commerical License: https://drive.google.com/file/d/1saDCPm74D4UWfBL17VbkTsZLGfpOQj1J/view?usp=sharing ### Citation Information [More Information Needed] ### Contributions

--- YAML 标签： - 复制粘贴通过标注工具获取的标签：https://github.com/huggingface/datasets-tagging --- # 数据集卡片：Nexdata/上海方言手机语音数据集 ## 目录 - [目录](#table-of-contents) - [数据集描述](#dataset-description) - [数据集总览](#dataset-summary) - [支持任务与评测榜单](#supported-tasks-and-leaderboards) - [语言覆盖](#languages) - [数据集结构](#dataset-structure) - [数据样例](#data-instances) - [数据字段](#data-fields) - [数据划分](#data-splits) - [数据集构建](#dataset-creation) - [构建初衷](#curation-rationale) - [源数据](#source-data) - [标注信息](#annotations) - [个人与敏感信息](#personal-and-sensitive-information) - [数据使用注意事项](#considerations-for-using-the-data) - [数据集的社会影响](#social-impact-of-dataset) - [偏差说明](#discussion-of-biases) - [其他已知局限](#other-known-limitations) - [附加信息](#additional-information) - [数据集维护者](#dataset-curators) - [许可信息](#licensing-information) - [引用信息](#citation-information) - [贡献声明](#contributions) ## 数据集描述 - **主页**：https://www.nexdata.ai/datasets/56?source=Huggingface - **代码仓库**： - **相关论文**： - **评测榜单**： - **联系方式**： ### 数据集总览该数据集收录了来自上海的2956名发音人，录制场景为安静室内环境。录制内容涵盖多领域客户咨询、短消息、数字串、上海POI等，语料无重复，平均句长为12.68词。录制设备采用主流安卓手机与iPhone。如需了解更多详情，请访问链接：https://www.nexdata.ai/datasets/56?source=Huggingface ### 支持任务与评测榜单自动语音识别（ASR）、音频说话人识别：该数据集可用于训练自动语音识别（ASR）模型。 ### 语言覆盖上海方言 ## 数据集结构 ### 数据样例 [需补充更多信息] ### 数据字段 [需补充更多信息] ### 数据划分 [需补充更多信息] ## 数据集构建 ### 构建初衷 [需补充更多信息] ### 源数据 #### 初始数据采集与标准化 [需补充更多信息] #### 源语言发布者是谁？ [需补充更多信息] ### 标注信息 #### 标注流程 [需补充更多信息] #### 标注者是谁？ [需补充更多信息] ### 个人与敏感信息 [需补充更多信息] ## 数据使用注意事项 ### 数据集的社会影响 [需补充更多信息] ### 偏差说明 [需补充更多信息] ### 其他已知局限 [需补充更多信息] ## 附加信息 ### 数据集维护者 [需补充更多信息] ### 许可信息商业许可：https://drive.google.com/file/d/1saDCPm74D4UWfBL17VbkTsZLGfpOQj1J/view?usp=sharing ### 引用信息 [需补充更多信息] ### 贡献声明 [无]

提供机构：

Nexdata

原始信息汇总

数据集概述

数据集名称

Nexdata/Shanghai_Dialect_Speech_Data_by_Mobile_Phone

数据集描述

数据集总结

该数据集收集了来自上海的2,956名说话者的语音数据，录音环境为安静的室内。录音内容包括多领域的客户咨询、短消息、数字、上海地点等。数据集无重复，平均句子长度为12.68个词。录音设备为主流的安卓手机和iPhone。

支持的任务和排行榜

自动语音识别（ASR）
音频说话人识别

语言

上海方言

数据集结构

数据实例

[信息待补充]

数据字段

[信息待补充]

数据分割

[信息待补充]

数据集创建

数据收集理由

[信息待补充]

源数据

初始数据收集和标准化 [信息待补充]
源语言生产者 [信息待补充]

标注

标注过程 [信息待补充]
标注者 [信息待补充]

个人和敏感信息

[信息待补充]

使用数据的考虑

数据集的社会影响

[信息待补充]

偏见讨论

[信息待补充]

其他已知限制

[信息待补充]

附加信息

数据集管理者

[信息待补充]

许可信息

商业许可：链接

引用信息

[信息待补充]

贡献

[信息待补充]

搜集汇总

数据集介绍

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集