five

cheulyop/ksponspeech

收藏
Hugging Face2021-10-02 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/cheulyop/ksponspeech
下载链接
链接失效反馈
官方服务:
资源简介:
--- YAML tags: - copy-paste the tags obtained with the tagging app: https://github.com/huggingface/datasets-tagging --- # Dataset Card for [KsponSpeech] ## Table of Contents - [Table of Contents](#table-of-contents) - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-fields) - [Data Splits](#data-splits) - [Dataset Creation](#dataset-creation) - [Curation Rationale](#curation-rationale) - [Source Data](#source-data) - [Annotations](#annotations) - [Personal and Sensitive Information](#personal-and-sensitive-information) - [Considerations for Using the Data](#considerations-for-using-the-data) - [Social Impact of Dataset](#social-impact-of-dataset) - [Discussion of Biases](#discussion-of-biases) - [Other Known Limitations](#other-known-limitations) - [Additional Information](#additional-information) - [Dataset Curators](#dataset-curators) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) - [Contributions](#contributions) ## Dataset Description - **Homepage:** - **Repository:** - **Paper:** - **Leaderboard:** - **Point of Contact:** ### Dataset Summary KsponSpeech is a large-scale spontaneous speech corpus of Korean conversations. This corpus contains 969 hrs of general open-domain dialog utterances, spoken by about 2,000 native Korean speakers in a clean environment. All data were constructed by recording the dialogue of two people freely conversing on a variety of topics and manually transcribing the utterances. The transcription provides a dual transcription consisting of orthography and pronunciation, and disfluency tags for spontaneity of speech, such as filler words, repeated words, and word fragments. KsponSpeech is publicly available on an open data hub site of the Korea government. (https://aihub.or.kr/aidata/105) ### Supported Tasks and Leaderboards [More Information Needed] ### Languages [More Information Needed] ## Dataset Structure ### Data Instances [More Information Needed] ### Data Fields [More Information Needed] ### Data Splits [More Information Needed] ## Dataset Creation ### Curation Rationale [More Information Needed] ### Source Data #### Initial Data Collection and Normalization [More Information Needed] #### Who are the source language producers? [More Information Needed] ### Annotations #### Annotation process [More Information Needed] #### Who are the annotators? [More Information Needed] ### Personal and Sensitive Information [More Information Needed] ## Considerations for Using the Data ### Social Impact of Dataset [More Information Needed] ### Discussion of Biases [More Information Needed] ### Other Known Limitations [More Information Needed] ## Additional Information ### Dataset Curators [More Information Needed] ### Licensing Information [More Information Needed] ### Citation Information [More Information Needed] ### Contributions Thanks to [@github-username](https://github.com/<github-username>) for adding this dataset.
提供机构:
cheulyop
原始信息汇总

数据集概述

数据集名称

KsponSpeech

数据集描述

KsponSpeech是一个大规模的韩语自发对话语料库,包含约969小时的普通开放领域对话语句,由大约2000名母语为韩语的说话者在清洁环境中录制。数据集通过记录两人自由讨论各种话题的对话并手动转录而成。转录内容包括正字法和发音的双重转录,以及用于表示言语自发性的不流畅标签,如填充词、重复词和词碎片。

语言

韩语

数据集结构

数据实例

[信息缺失]

数据字段

[信息缺失]

数据分割

[信息缺失]

数据集创建

数据收集理由

[信息缺失]

源数据

初始数据收集和规范化

[信息缺失]

源语言生产者

[信息缺失]

注释

注释过程

[信息缺失]

注释者

[信息缺失]

个人和敏感信息

[信息缺失]

使用数据集的考虑

数据集的社会影响

[信息缺失]

偏见讨论

[信息缺失]

其他已知限制

[信息缺失]

附加信息

数据集管理者

[信息缺失]

许可信息

[信息缺失]

引用信息

[信息缺失]

贡献

感谢@github-username添加此数据集。

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作