pianistprogrammer/abc2vec-irish-folk-dataset
收藏Hugging Face2026-04-16 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/pianistprogrammer/abc2vec-irish-folk-dataset
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
task_categories:
- other
language:
- en
tags:
- music
- folk-music
- irish-traditional-music
- abc-notation
- symbolic-music
size_categories:
- 100K<n<1M
---
# ABC2Vec Irish Folk Music Dataset
This dataset contains 211,524 Irish traditional tunes in ABC notation, preprocessed and split for training representation learning models.
## Dataset Description
- **Curated by:** IrishMAN Dataset (The Session + ABCnotation.com)
- **Processed for:** ABC2Vec: Self-Supervised Representation Learning for Irish Folk Music
- **Language:** ABC notation (symbolic music format)
- **License:** CC-BY-4.0
## Dataset Structure
### Data Splits
| Split | Tunes | File Size |
|-------|-------|-----------|
| Train | 198,893 | 70 MB |
| Validation | 10,469 | 3.7 MB |
| Test | 2,162 | 778 KB |
| **Total** | **211,524** | **~74 MB** |
### Data Fields
Each tune contains:
- `tune_id`: Unique identifier
- `title`: Tune name
- `abc_body`: ABC notation of the melody
- `tune_type`: Rhythmic category (jig, reel, polka, waltz, etc.)
- `mode`: Tonal mode (major, minor, dorian, mixolydian)
- `key`: Key signature
- `meter`: Time signature
- `bar_count`: Number of bars in the tune
### Dataset Statistics
- **Tune Types:** 44.9% reels, 21.3% jigs, 14.5% polkas, 12.2% waltzes
- **Modes:** 80.2% major, 11.3% minor, 5.4% Dorian, 3.0% Mixolydian
- **Keys:** 30.5% G, 26.8% D, 13.9% A (sharp keys dominant)
- **Median Length:** 18 bars, 287 characters
## Usage
```python
from datasets import load_dataset
# Load the entire dataset
dataset = load_dataset("pianistprogrammer/abc2vec-irish-folk-dataset")
# Access splits
train = dataset["train"]
val = dataset["validation"]
test = dataset["test"]
# Example tune
print(train[0]["abc_body"])
print(f"Type: {train[0]['tune_type']}, Mode: {train[0]['mode']}")
```
## Citation
If you use this dataset, please cite:
```bibtex
@article{abc2vec2025,
title={ABC2Vec: Self-Supervised Representation Learning for Irish Folk Music},
author={[Your Name]},
journal={[Journal Name]},
year={2025}
}
```
## Source
This dataset is derived from:
- **The Session** (thesession.org): Community-maintained Irish traditional music archive
- **ABCnotation.com**: Long-standing ABC notation repository
Processed as part of the IrishMAN (Irish Music ABC Notation) corpus.
## License
Creative Commons Attribution 4.0 International (CC-BY-4.0)
The original tunes are traditional folk music in the public domain. This processed dataset is released under CC-BY-4.0.
## Additional Files
- `vocab.json`: Character vocabulary for tokenization (98 tokens)
- `metadata.csv`: Complete metadata for all 211,524 tunes
## Contact
For questions or issues with this dataset, please open an issue on the [ABC2Vec GitHub repository](https://github.com/pianistprogrammer/ABC2VEC).
---
许可证:CC-BY-4.0
任务类别:
- 其他
语言:
- 英语
标签:
- 音乐
- 民间音乐
- 爱尔兰传统音乐
- ABC记谱法(ABC notation)
- 符号化音乐
数据规模:
- 10万 < 样本数 < 100万
---
# ABC2Vec 爱尔兰民间音乐数据集
本数据集包含211,524首采用ABC记谱法(ABC notation)记写的爱尔兰传统曲调,已完成预处理与数据集划分,可用于训练表征学习模型。
## 数据集说明
- **整理方:** IrishMAN 数据集(整合The Session与ABCnotation.com资源)
- **适配任务:** ABC2Vec:面向爱尔兰民间音乐的自监督表征学习(Self-Supervised Representation Learning for Irish Folk Music)
- **语言:** ABC记谱法(符号化音乐格式)
- **许可协议:** CC-BY-4.0
## 数据集结构
### 数据集拆分
| 拆分集 | 曲调数量 | 文件大小 |
|-------|-------|-----------|
| 训练集 | 198,893 | 70 MB |
| 验证集 | 10,469 | 3.7 MB |
| 测试集 | 2,162 | 778 KB |
| **总计** | **211,524** | **~74 MB** |
### 数据字段
每首曲调包含以下字段:
- `tune_id`:唯一标识符
- `title`:曲调名称
- `abc_body`:旋律的ABC记谱内容
- `tune_type`:节奏类别(如吉格舞曲(jig)、里尔舞曲(reel)、波洛奈兹舞曲(polka)、圆舞曲(waltz)等)
- `mode`:调式(如大调(major)、小调(minor)、多利亚调(dorian)、混合利底亚调(mixolydian)等)
- `key`:调号
- `meter`:节拍(time signature)
- `bar_count`:曲调的小节数
## 数据集统计
- **曲调类型分布:** 44.9%为里尔舞曲(reel),21.3%为吉格舞曲(jig),14.5%为波洛奈兹舞曲(polka),12.2%为圆舞曲(waltz)
- **调式分布:** 80.2%为大调(major),11.3%为小调(minor),5.4%为多利亚调(dorian),3.0%为混合利底亚调(mixolydian)
- **调号分布:** 30.5%为G调,26.8%为D调,13.9%为A调(升号调占主导)
- **中位数长度:** 18小节,287个字符
## 使用方法
python
from datasets import load_dataset
# 加载完整数据集
dataset = load_dataset("pianistprogrammer/abc2vec-irish-folk-dataset")
# 访问对应拆分集
train = dataset["train"]
val = dataset["validation"]
test = dataset["test"]
# 示例曲调
print(train[0]["abc_body"])
print(f"Type: {train[0]['tune_type']}, Mode: {train[0]['mode']}")
## 引用格式
如果使用本数据集,请引用以下文献:
bibtex
@article{abc2vec2025,
title={ABC2Vec: Self-Supervised Representation Learning for Irish Folk Music},
author={[Your Name]},
journal={[Journal Name]},
year={2025}
}
## 数据来源
本数据集源自以下资源:
- **The Session**(thesession.org):社区维护的爱尔兰传统音乐档案库
- **ABCnotation.com**:长期运营的ABC记谱法资源库
本数据集属于IrishMAN(爱尔兰音乐ABC记谱法,Irish Music ABC Notation)语料库的处理后版本。
## 许可协议
知识共享署名4.0国际许可协议(CC-BY-4.0)
原始曲调均属于公有领域的传统民间音乐,本处理后的数据集采用CC-BY-4.0协议发布。
## 附加文件
- `vocab.json`:用于分词的字符词表(共98个Token)
- `metadata.csv`:全部211,524首曲调的完整元数据
## 联系方式
如有关于本数据集的疑问或问题,请在[ABC2Vec GitHub仓库](https://github.com/pianistprogrammer/ABC2VEC)提交Issue。
提供机构:
pianistprogrammer



