five

NandemoGHS/Galgame_Gemini_Captions

收藏
Hugging Face2025-10-23 更新2026-01-03 收录
下载链接:
https://hf-mirror.com/datasets/NandemoGHS/Galgame_Gemini_Captions
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: - config_name: part1 features: - name: audio dtype: audio - name: text dtype: string - name: caption dtype: string - name: profile dtype: string - name: mood dtype: string - name: speed dtype: string - name: prosody dtype: string - name: pitch_timbre dtype: string - name: style dtype: string - name: notes dtype: string splits: - name: train num_bytes: 10071850949.2 num_examples: 141400 download_size: 9845050918 dataset_size: 10071850949.2 - config_name: part2 features: - name: audio dtype: audio - name: text dtype: string - name: caption dtype: string - name: emotion dtype: string - name: profile dtype: string - name: mood dtype: string - name: speed dtype: string - name: prosody dtype: string - name: pitch_timbre dtype: string - name: style dtype: string - name: notes dtype: string - name: refined_text dtype: string splits: - name: train num_bytes: 16580894851.05 num_examples: 235350 download_size: 16203742833 dataset_size: 16580894851.05 configs: - config_name: part1 data_files: - split: train path: part1/train-* - config_name: part2 data_files: - split: train path: part2/train-* license: cc-by-nc-4.0 task_categories: - text-to-speech - audio-classification language: - ja tags: - not-for-all-audiences --- # Galgame_Gemini_Captions ## Dataset Description This dataset consists of audio data, their corresponding transcriptions, and detailed audio captions generated by Gemini 2.5 Pro. The data is a subset of the [OOPPEENN/56697375616C4E6F76656C5F4461736574](https://huggingface.co/datasets/OOPPEENN/56697375616C4E6F76656C5F4461736574) dataset. It is intended for training Text-to-Speech (TTS) models that can be controlled via descriptive metadata tags (e.g., emotion, speaker profile, style). ## Dataset Structure The dataset is divided into two subsets: * **`part1`** * **`part2`** These subsets utilize different methodologies for caption generation. `part2` is considered to have higher quality captions for the following reasons: 1. It includes additional metadata, such as `emotion` tags. 2. When generating the captions, Gemini 2.5 Pro was provided with the original transcription text as context, leading to more accurate and relevant descriptions. ## Data Shuffling and Copyright Notice The data in this dataset has been completely shuffled. It does not contain any metadata (such as original filenames, speaker IDs, or sequential ordering) that would allow the reconstruction of the original source material. This step was taken to comply with the limitations for educational purposes under Japanese copyright law. ## License This dataset is licensed under **CC-BY-NC-4.0**. Additionally, as this dataset contains outputs generated by Gemini 2.5 Pro, **any use that competes with Gemini is prohibited.**
提供机构:
NandemoGHS
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作