cheulyop/ksponspeech

Name: cheulyop/ksponspeech
Creator: cheulyop
Published: 2021-10-02 04:27:13
License: 暂无描述

Hugging Face2021-10-02 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/cheulyop/ksponspeech

下载链接

链接失效反馈

官方服务：

资源简介：

--- YAML tags: - copy-paste the tags obtained with the tagging app: https://github.com/huggingface/datasets-tagging --- # Dataset Card for [KsponSpeech] ## Table of Contents - [Table of Contents](#table-of-contents) - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-fields) - [Data Splits](#data-splits) - [Dataset Creation](#dataset-creation) - [Curation Rationale](#curation-rationale) - [Source Data](#source-data) - [Annotations](#annotations) - [Personal and Sensitive Information](#personal-and-sensitive-information) - [Considerations for Using the Data](#considerations-for-using-the-data) - [Social Impact of Dataset](#social-impact-of-dataset) - [Discussion of Biases](#discussion-of-biases) - [Other Known Limitations](#other-known-limitations) - [Additional Information](#additional-information) - [Dataset Curators](#dataset-curators) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) - [Contributions](#contributions) ## Dataset Description - **Homepage:** - **Repository:** - **Paper:** - **Leaderboard:** - **Point of Contact:** ### Dataset Summary KsponSpeech is a large-scale spontaneous speech corpus of Korean conversations. This corpus contains 969 hrs of general open-domain dialog utterances, spoken by about 2,000 native Korean speakers in a clean environment. All data were constructed by recording the dialogue of two people freely conversing on a variety of topics and manually transcribing the utterances. The transcription provides a dual transcription consisting of orthography and pronunciation, and disfluency tags for spontaneity of speech, such as filler words, repeated words, and word fragments. KsponSpeech is publicly available on an open data hub site of the Korea government. (https://aihub.or.kr/aidata/105) ### Supported Tasks and Leaderboards [More Information Needed] ### Languages [More Information Needed] ## Dataset Structure ### Data Instances [More Information Needed] ### Data Fields [More Information Needed] ### Data Splits [More Information Needed] ## Dataset Creation ### Curation Rationale [More Information Needed] ### Source Data #### Initial Data Collection and Normalization [More Information Needed] #### Who are the source language producers? [More Information Needed] ### Annotations #### Annotation process [More Information Needed] #### Who are the annotators? [More Information Needed] ### Personal and Sensitive Information [More Information Needed] ## Considerations for Using the Data ### Social Impact of Dataset [More Information Needed] ### Discussion of Biases [More Information Needed] ### Other Known Limitations [More Information Needed] ## Additional Information ### Dataset Curators [More Information Needed] ### Licensing Information [More Information Needed] ### Citation Information [More Information Needed] ### Contributions Thanks to [@github-username](https://github.com/<github-username>) for adding this dataset.

提供机构：

cheulyop

原始信息汇总

数据集概述

数据集名称

KsponSpeech

数据集描述

KsponSpeech是一个大规模的韩语自发对话语料库，包含约969小时的普通开放领域对话语句，由大约2000名母语为韩语的说话者在清洁环境中录制。数据集通过记录两人自由讨论各种话题的对话并手动转录而成。转录内容包括正字法和发音的双重转录，以及用于表示言语自发性的不流畅标签，如填充词、重复词和词碎片。

语言

韩语

数据集结构

数据实例

[信息缺失]

数据字段

[信息缺失]

数据分割

[信息缺失]

数据集创建

数据收集理由

[信息缺失]

源数据

初始数据收集和规范化

[信息缺失]

源语言生产者

[信息缺失]

注释

注释过程

[信息缺失]

注释者

[信息缺失]

个人和敏感信息

[信息缺失]

使用数据集的考虑

数据集的社会影响

[信息缺失]

偏见讨论

[信息缺失]

其他已知限制

[信息缺失]

附加信息

数据集管理者

[信息缺失]

许可信息

[信息缺失]

引用信息

[信息缺失]

贡献

感谢@github-username添加此数据集。

5,000+

优质数据集

54 个

任务类型

进入经典数据集