infinite-dataset-hub/TTSKoreanLanguage
收藏Hugging Face2024-08-29 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/infinite-dataset-hub/TTSKoreanLanguage
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
tags:
- infinite-dataset-hub
- synthetic
---
# TTSKoreanLanguage
tags: Text-to-Speech, Natural Language Processing, Korean Script Evolution
_Note: This is an AI-generated dataset so its content may be inaccurate or false_
**Dataset Description:**
The 'TTSKoreanLanguage' dataset is a curated collection of Korean script textual data designed for machine learning tasks related to Text-to-Speech (TTS) and Natural Language Processing (NLP). It focuses on the evolution and pronunciation of the Korean script (Hangul) and includes metadata for supervised learning, particularly for TTS systems that require accurate speech synthesis from Korean text. The dataset includes historical and contemporary texts, annotated with phonetic and prosodic information to facilitate TTS development. The dataset labels include the original text, phonetic transcription, and a difficulty rating based on linguistic complexity.
**CSV Content Preview:**
```csv
text_id, original_text, phonetic_transcription, difficulty_rating, label
001, 안녕하세요, 안녕하세요, 1, basic
002, 저는 한국어를 배우고 있습니다, 저는 한국어를 배우고 있습니다, 2, intermediate
003, 현재 시기에는 한국의 문화가 세계에 큰 영향을 미치고 있습니다, 현재 시기에는 한국의 문화가 세계에 큰 영향을 미치고 있습니다, 3, advanced
004, 윤리적인 대화를 취하는 것은 컴퓨터 인물들에게도 중요하다, 윤리적인 대화를 취하는 것은 컴퓨터 인물들에게도 중요하다, 2, intermediate
005, 한국어의 형태소 분석은 기본적으로 이루어지는 피쳐를 받아들일 수 있습니다, 한국어의 형태소 분석은 기본적으로 이루어지는 피쳐를 받아들일 수 있습니다, 3, advanced
```
In this CSV preview, each row represents a unique entry in the dataset. The `text_id` is a unique identifier for each entry. The `original_text` column contains the Korean script text in Hangul. The `phonetic_transcription` column provides the IPA (International Phonetic Alphabet) transcription of the text. The `difficulty_rating` column rates the linguistic complexity of the text from 1 (basic) to 3 (advanced). The `label` column categorizes the texts into difficulty levels to assist TTS systems in adjusting speech synthesis parameters accordingly.
**Source of the data:**
The dataset was generated using the [Infinite Dataset Hub](https://huggingface.co/spaces/infinite-dataset-hub/infinite-dataset-hub) and microsoft/Phi-3-mini-4k-instruct using the query 'korea tts script':
- **Dataset Generation Page**: https://huggingface.co/spaces/infinite-dataset-hub/infinite-dataset-hub?q=korea+tts+script&dataset=TTSKoreanLanguage&tags=Text-to-Speech,+Natural+Language+Processing,+Korean+Script+Evolution
- **Model**: https://huggingface.co/microsoft/Phi-3-mini-4k-instruct
- **More Datasets**: https://huggingface.co/datasets?other=infinite-dataset-hub
提供机构:
infinite-dataset-hub



