NandemoGHS/Galgame_Gemini_Captions
收藏Hugging Face2025-10-23 更新2026-01-03 收录
下载链接:
https://hf-mirror.com/datasets/NandemoGHS/Galgame_Gemini_Captions
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: part1
features:
- name: audio
dtype: audio
- name: text
dtype: string
- name: caption
dtype: string
- name: profile
dtype: string
- name: mood
dtype: string
- name: speed
dtype: string
- name: prosody
dtype: string
- name: pitch_timbre
dtype: string
- name: style
dtype: string
- name: notes
dtype: string
splits:
- name: train
num_bytes: 10071850949.2
num_examples: 141400
download_size: 9845050918
dataset_size: 10071850949.2
- config_name: part2
features:
- name: audio
dtype: audio
- name: text
dtype: string
- name: caption
dtype: string
- name: emotion
dtype: string
- name: profile
dtype: string
- name: mood
dtype: string
- name: speed
dtype: string
- name: prosody
dtype: string
- name: pitch_timbre
dtype: string
- name: style
dtype: string
- name: notes
dtype: string
- name: refined_text
dtype: string
splits:
- name: train
num_bytes: 16580894851.05
num_examples: 235350
download_size: 16203742833
dataset_size: 16580894851.05
configs:
- config_name: part1
data_files:
- split: train
path: part1/train-*
- config_name: part2
data_files:
- split: train
path: part2/train-*
license: cc-by-nc-4.0
task_categories:
- text-to-speech
- audio-classification
language:
- ja
tags:
- not-for-all-audiences
---
# Galgame_Gemini_Captions
## Dataset Description
This dataset consists of audio data, their corresponding transcriptions, and detailed audio captions generated by Gemini 2.5 Pro. The data is a subset of the [OOPPEENN/56697375616C4E6F76656C5F4461736574](https://huggingface.co/datasets/OOPPEENN/56697375616C4E6F76656C5F4461736574) dataset.
It is intended for training Text-to-Speech (TTS) models that can be controlled via descriptive metadata tags (e.g., emotion, speaker profile, style).
## Dataset Structure
The dataset is divided into two subsets:
* **`part1`**
* **`part2`**
These subsets utilize different methodologies for caption generation. `part2` is considered to have higher quality captions for the following reasons:
1. It includes additional metadata, such as `emotion` tags.
2. When generating the captions, Gemini 2.5 Pro was provided with the original transcription text as context, leading to more accurate and relevant descriptions.
## Data Shuffling and Copyright Notice
The data in this dataset has been completely shuffled.
It does not contain any metadata (such as original filenames, speaker IDs, or sequential ordering) that would allow the reconstruction of the original source material. This step was taken to comply with the limitations for educational purposes under Japanese copyright law.
## License
This dataset is licensed under **CC-BY-NC-4.0**.
Additionally, as this dataset contains outputs generated by Gemini 2.5 Pro, **any use that competes with Gemini is prohibited.**
提供机构:
NandemoGHS



