NandemoGHS/Galgame_Gemini_Captions

Name: NandemoGHS/Galgame_Gemini_Captions
Creator: NandemoGHS
Published: 2025-10-23 12:45:27
License: 暂无描述

Hugging Face2025-10-23 更新2026-01-03 收录

下载链接：

https://hf-mirror.com/datasets/NandemoGHS/Galgame_Gemini_Captions

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: - config_name: part1 features: - name: audio dtype: audio - name: text dtype: string - name: caption dtype: string - name: profile dtype: string - name: mood dtype: string - name: speed dtype: string - name: prosody dtype: string - name: pitch_timbre dtype: string - name: style dtype: string - name: notes dtype: string splits: - name: train num_bytes: 10071850949.2 num_examples: 141400 download_size: 9845050918 dataset_size: 10071850949.2 - config_name: part2 features: - name: audio dtype: audio - name: text dtype: string - name: caption dtype: string - name: emotion dtype: string - name: profile dtype: string - name: mood dtype: string - name: speed dtype: string - name: prosody dtype: string - name: pitch_timbre dtype: string - name: style dtype: string - name: notes dtype: string - name: refined_text dtype: string splits: - name: train num_bytes: 16580894851.05 num_examples: 235350 download_size: 16203742833 dataset_size: 16580894851.05 configs: - config_name: part1 data_files: - split: train path: part1/train-* - config_name: part2 data_files: - split: train path: part2/train-* license: cc-by-nc-4.0 task_categories: - text-to-speech - audio-classification language: - ja tags: - not-for-all-audiences --- # Galgame_Gemini_Captions ## Dataset Description This dataset consists of audio data, their corresponding transcriptions, and detailed audio captions generated by Gemini 2.5 Pro. The data is a subset of the [OOPPEENN/56697375616C4E6F76656C5F4461736574](https://huggingface.co/datasets/OOPPEENN/56697375616C4E6F76656C5F4461736574) dataset. It is intended for training Text-to-Speech (TTS) models that can be controlled via descriptive metadata tags (e.g., emotion, speaker profile, style). ## Dataset Structure The dataset is divided into two subsets: * **`part1`** * **`part2`** These subsets utilize different methodologies for caption generation. `part2` is considered to have higher quality captions for the following reasons: 1. It includes additional metadata, such as `emotion` tags. 2. When generating the captions, Gemini 2.5 Pro was provided with the original transcription text as context, leading to more accurate and relevant descriptions. ## Data Shuffling and Copyright Notice The data in this dataset has been completely shuffled. It does not contain any metadata (such as original filenames, speaker IDs, or sequential ordering) that would allow the reconstruction of the original source material. This step was taken to comply with the limitations for educational purposes under Japanese copyright law. ## License This dataset is licensed under **CC-BY-NC-4.0**. Additionally, as this dataset contains outputs generated by Gemini 2.5 Pro, **any use that competes with Gemini is prohibited.**

提供机构：

NandemoGHS

5,000+

优质数据集

54 个

任务类型

进入经典数据集