five

pianistprogrammer/abc2vec-irish-folk-dataset

收藏
Hugging Face2026-04-16 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/pianistprogrammer/abc2vec-irish-folk-dataset
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 task_categories: - other language: - en tags: - music - folk-music - irish-traditional-music - abc-notation - symbolic-music size_categories: - 100K<n<1M --- # ABC2Vec Irish Folk Music Dataset This dataset contains 211,524 Irish traditional tunes in ABC notation, preprocessed and split for training representation learning models. ## Dataset Description - **Curated by:** IrishMAN Dataset (The Session + ABCnotation.com) - **Processed for:** ABC2Vec: Self-Supervised Representation Learning for Irish Folk Music - **Language:** ABC notation (symbolic music format) - **License:** CC-BY-4.0 ## Dataset Structure ### Data Splits | Split | Tunes | File Size | |-------|-------|-----------| | Train | 198,893 | 70 MB | | Validation | 10,469 | 3.7 MB | | Test | 2,162 | 778 KB | | **Total** | **211,524** | **~74 MB** | ### Data Fields Each tune contains: - `tune_id`: Unique identifier - `title`: Tune name - `abc_body`: ABC notation of the melody - `tune_type`: Rhythmic category (jig, reel, polka, waltz, etc.) - `mode`: Tonal mode (major, minor, dorian, mixolydian) - `key`: Key signature - `meter`: Time signature - `bar_count`: Number of bars in the tune ### Dataset Statistics - **Tune Types:** 44.9% reels, 21.3% jigs, 14.5% polkas, 12.2% waltzes - **Modes:** 80.2% major, 11.3% minor, 5.4% Dorian, 3.0% Mixolydian - **Keys:** 30.5% G, 26.8% D, 13.9% A (sharp keys dominant) - **Median Length:** 18 bars, 287 characters ## Usage ```python from datasets import load_dataset # Load the entire dataset dataset = load_dataset("pianistprogrammer/abc2vec-irish-folk-dataset") # Access splits train = dataset["train"] val = dataset["validation"] test = dataset["test"] # Example tune print(train[0]["abc_body"]) print(f"Type: {train[0]['tune_type']}, Mode: {train[0]['mode']}") ``` ## Citation If you use this dataset, please cite: ```bibtex @article{abc2vec2025, title={ABC2Vec: Self-Supervised Representation Learning for Irish Folk Music}, author={[Your Name]}, journal={[Journal Name]}, year={2025} } ``` ## Source This dataset is derived from: - **The Session** (thesession.org): Community-maintained Irish traditional music archive - **ABCnotation.com**: Long-standing ABC notation repository Processed as part of the IrishMAN (Irish Music ABC Notation) corpus. ## License Creative Commons Attribution 4.0 International (CC-BY-4.0) The original tunes are traditional folk music in the public domain. This processed dataset is released under CC-BY-4.0. ## Additional Files - `vocab.json`: Character vocabulary for tokenization (98 tokens) - `metadata.csv`: Complete metadata for all 211,524 tunes ## Contact For questions or issues with this dataset, please open an issue on the [ABC2Vec GitHub repository](https://github.com/pianistprogrammer/ABC2VEC).

--- 许可证:CC-BY-4.0 任务类别: - 其他 语言: - 英语 标签: - 音乐 - 民间音乐 - 爱尔兰传统音乐 - ABC记谱法(ABC notation) - 符号化音乐 数据规模: - 10万 < 样本数 < 100万 --- # ABC2Vec 爱尔兰民间音乐数据集 本数据集包含211,524首采用ABC记谱法(ABC notation)记写的爱尔兰传统曲调,已完成预处理与数据集划分,可用于训练表征学习模型。 ## 数据集说明 - **整理方:** IrishMAN 数据集(整合The Session与ABCnotation.com资源) - **适配任务:** ABC2Vec:面向爱尔兰民间音乐的自监督表征学习(Self-Supervised Representation Learning for Irish Folk Music) - **语言:** ABC记谱法(符号化音乐格式) - **许可协议:** CC-BY-4.0 ## 数据集结构 ### 数据集拆分 | 拆分集 | 曲调数量 | 文件大小 | |-------|-------|-----------| | 训练集 | 198,893 | 70 MB | | 验证集 | 10,469 | 3.7 MB | | 测试集 | 2,162 | 778 KB | | **总计** | **211,524** | **~74 MB** | ### 数据字段 每首曲调包含以下字段: - `tune_id`:唯一标识符 - `title`:曲调名称 - `abc_body`:旋律的ABC记谱内容 - `tune_type`:节奏类别(如吉格舞曲(jig)、里尔舞曲(reel)、波洛奈兹舞曲(polka)、圆舞曲(waltz)等) - `mode`:调式(如大调(major)、小调(minor)、多利亚调(dorian)、混合利底亚调(mixolydian)等) - `key`:调号 - `meter`:节拍(time signature) - `bar_count`:曲调的小节数 ## 数据集统计 - **曲调类型分布:** 44.9%为里尔舞曲(reel),21.3%为吉格舞曲(jig),14.5%为波洛奈兹舞曲(polka),12.2%为圆舞曲(waltz) - **调式分布:** 80.2%为大调(major),11.3%为小调(minor),5.4%为多利亚调(dorian),3.0%为混合利底亚调(mixolydian) - **调号分布:** 30.5%为G调,26.8%为D调,13.9%为A调(升号调占主导) - **中位数长度:** 18小节,287个字符 ## 使用方法 python from datasets import load_dataset # 加载完整数据集 dataset = load_dataset("pianistprogrammer/abc2vec-irish-folk-dataset") # 访问对应拆分集 train = dataset["train"] val = dataset["validation"] test = dataset["test"] # 示例曲调 print(train[0]["abc_body"]) print(f"Type: {train[0]['tune_type']}, Mode: {train[0]['mode']}") ## 引用格式 如果使用本数据集,请引用以下文献: bibtex @article{abc2vec2025, title={ABC2Vec: Self-Supervised Representation Learning for Irish Folk Music}, author={[Your Name]}, journal={[Journal Name]}, year={2025} } ## 数据来源 本数据集源自以下资源: - **The Session**(thesession.org):社区维护的爱尔兰传统音乐档案库 - **ABCnotation.com**:长期运营的ABC记谱法资源库 本数据集属于IrishMAN(爱尔兰音乐ABC记谱法,Irish Music ABC Notation)语料库的处理后版本。 ## 许可协议 知识共享署名4.0国际许可协议(CC-BY-4.0) 原始曲调均属于公有领域的传统民间音乐,本处理后的数据集采用CC-BY-4.0协议发布。 ## 附加文件 - `vocab.json`:用于分词的字符词表(共98个Token) - `metadata.csv`:全部211,524首曲调的完整元数据 ## 联系方式 如有关于本数据集的疑问或问题,请在[ABC2Vec GitHub仓库](https://github.com/pianistprogrammer/ABC2VEC)提交Issue。
提供机构:
pianistprogrammer
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作