five

YGGYY/GTSinger

收藏
Hugging Face2026-02-11 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/YGGYY/GTSinger
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-nc-sa-4.0 task_categories: - text-to-audio - text-to-speech language: - zh - en - fr - ja - ko - es - de - ru - it tags: - singing - audio - croissant pretty_name: a size_categories: - 1B<n<10B configs: - config_name: meta data_files: processed/All/metadata.json --- # GTSinger: A Global Multi-Technique Singing Corpus with Realistic Music Scores for All Singing Tasks #### Yu Zhang*, Changhao Pan*, Wenxiang Guo*, Ruiqi Li, Zhiyuan Zhu, Jialei Wang, Wenhao Xu, Jingyu Lu, Zhiqing Hong, Chuxin Wang, LiChao Zhang, Jinzheng He, Ziyue Jiang, Yuxin Chen, Chen Yang, Jiecheng Zhou, Xinyu Cheng, Zhou Zhao | Zhejiang University Dataset of [GTSinger (NeurIPS 2024 Spotlight)](https://arxiv.org/abs/2409.13832): A Global Multi-Technique Singing Corpus with Realistic Music Scores for All Singing Tasks. [![arXiv](https://img.shields.io/badge/arXiv-Paper-<COLOR>.svg)](https://arxiv.org/abs/2409.13832) [![GitHub](https://img.shields.io/badge/GitHub-Repo-black.svg)](https://github.com/AaronZ345/GTSinger) [![weixin](https://img.shields.io/badge/-WeChat@机器之心-000000?logo=wechat&logoColor=07C160)](https://mp.weixin.qq.com/s/B1Iqr-24l57f0MslzYEslA) [![weixin](https://img.shields.io/badge/-WeChat@PaperWeekly-000000?logo=wechat&logoColor=07C160)](https://mp.weixin.qq.com/s/6RLdUzJM5PItklKUTTNz2w) [![zhihu](https://img.shields.io/badge/-知乎-000000?logo=zhihu&logoColor=0084FF)](https://zhuanlan.zhihu.com/p/993933492) [![Google Drive](https://img.shields.io/badge/Google%20Drive-Link-blue?logo=googledrive&logoColor=white)](https://drive.google.com/drive/folders/1xcdvCxNAEEfJElt7sEP-xT8dMKxn1_Lz?usp=drive_link) We introduce GTSinger, a large Global, multi-Technique, free-to-use, high-quality singing corpus with realistic music scores, designed for all singing tasks, along with its benchmarks. We provide the **full corpus for free** in this repository. And `metadata.json` and `phone_set.json` are also offered for each language in `processed`. **Note: you should change the wav_fn for each segment to your own absolute path! And you can use metadata of multiple languages by concat their data! We will provide the metadata for other languages soon!** Besides, we also provide our dataset on [Google Drive](https://drive.google.com/drive/folders/1xcdvCxNAEEfJElt7sEP-xT8dMKxn1_Lz?usp=drive_link). Moreover, you can visit our [Demo Page](https://aaronz345.github.io/GTSingerDemo) for the audio samples of our dataset as well as the results of our benchmarks. ## Updates - 2025.02: We released all processed data of GTSinger and refined 7/9 languages! - 2024.10: We refine the paired speech data of each language! - 2024.10: We released the processed data of Chinese, English, Spanish, German, Russian! - 2024.09: We released the full dataset of GTSinger! - 2024.09: GTSinger is accepted by NeurIPS 2024 (Spotlight)! ## Key Features - **80.59 hours of singing voices** in GTSinger are recorded in professional studios by skilled singers, ensuring **high quality and clarity**, forming the largest recorded singing dataset. - Contributed by **20 singers** across **nine widely spoken languages** (Chinese, English, Japanese, Korean, Russian, Spanish, French, German, and Italian) and all four vocal ranges, GTSinger enables zero-shot SVS and style transfer models to learn diverse timbres and styles. - GTSinger provides **controlled comparison** and **phoneme-level annotations** of **six singing techniques** (mixed voice, falsetto, breathy, pharyngeal, vibrato, and glissando) for songs, thereby facilitating singing technique modeling, recognition, and control. - Unlike fine-grained music scores, GTSinger features **realistic music scores** with regular note duration, assisting singing models in learning and adapting to real-world musical composition. - The dataset includes **manual phoneme-to-audio alignments, global style labels** (singing method, emotion, range, and pace), and **16.16 hours of paired speech**, ensuring comprehensive annotations and broad task suitability. ## Dataset ### Where to download Through this repo you can access our **full dataset** (audio along with TextGrid, json, musicxml) and **processed data** (metadata.json, phone_set.json, spker_set.json) on Hugging Face **for free**! Hope our data is helpful for your research. Besides, we also provide our dataset on [![Google Drive](https://img.shields.io/badge/Google%20Drive-Link-blue?logo=googledrive&logoColor=white)](https://drive.google.com/drive/folders/1xcdvCxNAEEfJElt7sEP-xT8dMKxn1_Lz?usp=drive_link). **Please note that, if you are using GTSinger, it means that you have accepted the terms of [license](./dataset_license.md).** ### Data Architecture Our dataset is organized hierarchically. It presents nine top-level folders, each corresponding to a distinct language. Within each language folder, there are five sub-folders, each representing a specific singing technique. These technique folders contain numerous song entries, with each song further divided into several controlled comparison groups: a control group (natural singing without the specific technique), and a technique group (densely employing the specific technique). Our singing voices and speech are recorded at a 48kHz sampling rate with 24-bit resolution in WAV format. Alignments and annotations are provided in TextGrid files, including word boundaries, phoneme boundaries, phoneme-level annotations for six techniques, and global style labels (singing method, emotion, pace, and range). We also provide realistic music scores in musicxml format. Notably, we provide an additional JSON file for each singing voice, facilitating data parsing and processing for singing models. Here is the data structure of our dataset: ``` . ├── Chinese │   ├── ZH-Alto-1 │   └── ZH-Tenor-1 ├── English │   ├── EN-Alto-1 │   │   ├── Breathy │   │   ├── Glissando │   │   │ └── my love │   │   │ ├── Control_Group │   │   │ ├── Glissando_Group │   │   │ └── Paired_Speech_Group │   │   ├── Mixed_Voice_and_Falsetto │   │   ├── Pharyngeal │   │   └── Vibrato │   ├── EN-Alto-2 │   │   ├── Breathy │   │   ├── Glissando │   │   ├── Mixed_Voice_and_Falsetto │   │   ├── Pharyngeal │   │   └── Vibrato │   └── EN-Tenor-1 │      ├── Breathy │      ├── Glissando │      ├── Mixed_Voice_and_Falsetto │      ├── Pharyngeal │      └── Vibrato ├── French │   ├── FR-Soprano-1 │   └── FR-Tenor-1 ├── German │   ├── DE-Soprano-1 │   └── DE-Tenor-1 ├── Italian │   ├── IT-Bass-1 │   ├── IT-Bass-2 │   └── IT-Soprano-1 ├── Japanese │   ├── JA-Soprano-1 │   └── JA-Tenor-1 ├── Korean │   ├── KO-Soprano-1 │   ├── KO-Soprano-2 │   └── KO-Tenor-1 ├── Russian │   └── RU-Alto-1 └── Spanish ├── ES-Bass-1 └── ES-Soprano-1 ``` ## Citations ## If you find this code useful in your research, please cite our work: ```bib @article{zhang2024gtsinger, title={Gtsinger: A global multi-technique singing corpus with realistic music scores for all singing tasks}, author={Zhang, Yu and Pan, Changhao and Guo, Wenxiang and Li, Ruiqi and Zhu, Zhiyuan and Wang, Jialei and Xu, Wenhao and Lu, Jingyu and Hong, Zhiqing and Wang, Chuxin and others}, journal={arXiv preprint arXiv:2409.13832}, year={2024} } ``` ## Disclaimer ## Any organization or individual is prohibited from using any technology mentioned in this paper to generate someone's singing without his/her consent, including but not limited to government leaders, political figures, and celebrities. If you do not comply with this item, you could be in violation of copyright laws.

--- 许可证:CC BY-NC-SA 4.0 任务类别: - 文本到音频(text-to-audio) - 文本到语音(text-to-speech) 语言: - 中文(zh) - 英文(en) - 法语(fr) - 日语(ja) - 韩语(ko) - 西班牙语(es) - 德语(de) - 俄语(ru) - 意大利语(it) 标签: - 歌唱(singing) - 音频(audio) - 可颂(croissant) 美观名称:a 规模类别:10亿 < 样本量 < 100亿 配置项: - 配置名称:meta 数据文件:processed/All/metadata.json --- # GTSinger:面向全歌唱任务的全球多技法歌唱语料库与写实音乐乐谱 #### 张宇*, 潘昶浩*, 郭文祥*, 李睿琪, 朱志远, 王佳磊, 徐文豪, 卢靖宇, 洪智清, 王楚昕, 张立超, 何金铮, 姜子越, 陈雨鑫, 杨晨, 周杰诚, 程心怡, 赵洲 | 浙江大学 本数据集对应[GTSinger(NeurIPS 2024 Spotlight论文)](https://arxiv.org/abs/2409.13832):面向全歌唱任务的全球多技法歌唱语料库与写实音乐乐谱。 [![arXiv](https://img.shields.io/badge/arXiv-Paper-<COLOR>.svg)](https://arxiv.org/abs/2409.13832) [![GitHub](https://img.shields.io/badge/GitHub-仓库-black.svg)](https://github.com/AaronZ345/GTSinger) [![微信](https://img.shields.io/badge/-WeChat@机器之心-000000?logo=wechat&logoColor=07C160)](https://mp.weixin.qq.com/s/B1Iqr-24l57f0MslzYEslA) [![微信](https://img.shields.io/badge/-WeChat@PaperWeekly-000000?logo=wechat&logoColor=07C160)](https://mp.weixin.qq.com/s/6RLdUzJM5PItklKUTTNz2w) [![知乎](https://img.shields.io/badge/-知乎-000000?logo=zhihu&logoColor=0084FF)](https://zhuanlan.zhihu.com/p/993933492) [![Google Drive](https://img.shields.io/badge/Google%20Drive-链接-blue?logo=googledrive&logoColor=white)](https://drive.google.com/drive/folders/1xcdvCxNAEEfJElt7sEP-xT8dMKxn1_Lz?usp=drive_link) 我们推出了GTSinger,这是一款面向全歌唱任务的大型全球多技法、免费可用的高质量歌唱语料库,附带写实音乐乐谱,同时配套基准测试方案。 我们在本仓库中免费提供**完整语料库**。 我们还为`processed`目录下的每种语言提供了`metadata.json`与`phone_set.json`文件。**注意:请将每个片段的`wav_fn`替换为您本地的绝对路径!您可通过拼接多语言元数据文件来使用多语言元数据!我们将尽快推出其余语言的元数据文件!** 此外,我们还在[谷歌云端硬盘(Google Drive)](https://drive.google.com/drive/folders/1xcdvCxNAEEfJElt7sEP-xT8dMKxn1_Lz?usp=drive_link)上提供了本数据集。 您可访问我们的[演示页面(Demo Page)](https://aaronz345.github.io/GTSingerDemo),查看本数据集的音频样例与基准测试结果。 ## 更新日志 - 2025.02:发布GTSinger全部预处理数据,并完善了9种语言中的7种! - 2024.10:优化了每种语言的配对语音数据! - 2024.10:发布中文、英文、西班牙语、德语、俄语的预处理数据! - 2024.09:发布GTSinger完整数据集! - 2024.09:GTSinger被NeurIPS 2024收录为Spotlight论文! ## 核心特性 - **GTSinger包含80.59小时歌唱音频**,均由专业歌手在专业录音棚录制,确保**高音质与清晰度**,是目前规模最大的录制型歌唱数据集。 - 本数据集由**20位歌手**参与录制,覆盖**9种广泛使用的语言**(中文、英文、日语、韩语、俄语、西班牙语、法语、德语、意大利语)以及全部4种音域,可支持零样本(zero-shot)歌声合成(SVS)与风格迁移模型学习多样化的音色与风格。 - GTSinger为歌曲提供**受控对比实验**与**音素级标注**,涵盖**6种歌唱技法**(混声、假声、气声、咽音、颤音、滑音),可助力歌唱技法建模、识别与调控。 - 与细粒度音乐乐谱不同,GTSinger采用**规则音符时值的写实音乐乐谱**,可帮助歌唱模型学习并适配真实世界的音乐作品。 - 本数据集包含**人工音素-音频对齐标注、全局风格标签**(歌唱技法、情感、音域、节奏)以及**16.16小时配对语音数据**,确保标注全面且适配多种任务场景。 ## 数据集 ### 下载方式 通过本仓库,您可在Hugging Face平台**免费获取**本数据集的**完整数据**(包含音频、TextGrid、JSON与MusicXML文件)以及**预处理数据**(`metadata.json`、`phone_set.json`与`spker_set.json`)!希望本数据集对您的研究有所帮助。 此外,我们还在[谷歌云端硬盘(Google Drive)](https://drive.google.com/drive/folders/1xcdvCxNAEEfJElt7sEP-xT8dMKxn1_Lz?usp=drive_link)上提供了本数据集。 **请注意:使用GTSinger即代表您已同意[许可协议](./dataset_license.md)中的条款。** ### 数据架构 本数据集采用层级化组织方式。 数据集包含9个顶级文件夹,每个对应一种独立语言。 每个语言文件夹下包含5个子文件夹,每个对应一种特定歌唱技法。 每个技法文件夹下包含多个歌曲条目,每首歌曲进一步划分为多个受控对比组:对照组(不使用特定技法的自然歌唱)与技法组(密集使用该特定技法的歌唱)。 本数据集的歌唱音频与语音数据均采用WAV格式录制,采样率为48kHz,位深为24比特。 对齐与标注信息存储于TextGrid文件中,包含词边界、音素边界、6种技法的音素级标注以及全局风格标签(歌唱技法、情感、节奏、音域)。 我们还提供MusicXML格式的写实音乐乐谱。 值得注意的是,我们为每条歌唱音频提供了额外的JSON文件,以方便歌唱模型的数据解析与处理。 以下为本数据集的目录结构: . ├── Chinese │ ├── ZH-Alto-1 │ └── ZH-Tenor-1 ├── English │ ├── EN-Alto-1 │ │ ├── Breathy │ │ ├── Glissando │ │ │ └── my love │ │ │ ├── Control_Group │ │ │ ├── Glissando_Group │ │ │ └── Paired_Speech_Group │ │ ├── Mixed_Voice_and_Falsetto │ │ ├── Pharyngeal │ │ └── Vibrato │ ├── EN-Alto-2 │ │ ├── Breathy │ │ ├── Glissando │ │ ├── Mixed_Voice_and_Falsetto │ │ ├── Pharyngeal │ │ └── Vibrato │ └── EN-Tenor-1 │ ├── Breathy │ ├── Glissando │ ├── Mixed_Voice_and_Falsetto │ ├── Pharyngeal │ └── Vibrato ├── French │ ├── FR-Soprano-1 │ └── FR-Tenor-1 ├── German │ ├── DE-Soprano-1 │ └── DE-Tenor-1 ├── Italian │ ├── IT-Bass-1 │ ├── IT-Bass-2 │ └── IT-Soprano-1 ├── Japanese │ ├── JA-Soprano-1 │ └── JA-Tenor-1 ├── Korean │ ├── KO-Soprano-1 │ ├── KO-Soprano-2 │ └── KO-Tenor-1 ├── Russian │ └── RU-Alto-1 └── Spanish ├── ES-Bass-1 └── ES-Soprano-1 ## 引用 如果本数据集对您的研究有所帮助,请引用我们的工作: bib @article{zhang2024gtsinger, title={GTSinger: A global multi-technique singing corpus with realistic music scores for all singing tasks}, author={Zhang, Yu and Pan, Changhao and Guo, Wenxiang and Li, Ruiqi and Zhu, Zhiyuan and Wang, Jialei and Xu, Wenhao and Lu, Jingyu and Hong, Zhiqing and Wang, Chuxin and others}, journal={arXiv preprint arXiv:2409.13832}, year={2024} } ## 免责声明 任何组织或个人均不得未经他人同意,使用本文提及的技术生成他人的歌唱音频,包括但不限于政府领导人、政治人物与名人。若违反本条款,可能涉嫌侵犯著作权法。
提供机构:
YGGYY
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作