malekghanmi/tunisian-malouf-dataset
收藏Hugging Face2026-04-19 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/malekghanmi/tunisian-malouf-dataset
下载链接
链接失效反馈官方服务:
资源简介:
# 🎵 Tunisian Malouf Music Dataset
> **First open-source labeled Tunisian music dataset**, designed for fine-tuning generative music models — particularly MusicGen by Meta AI.
---
## 📖 Project Context
This dataset was built as part of a final year project (PFA) focused on **adapting the MusicGen model to Tunisian music**.
The goal is to enable AI models to generate authentic Tunisian music from text descriptions, while respecting:
- **Maqamat** (Arabic musical modes: Sika, Hsin, Raml, Iraq, Dhil, Asbahan...)
- **Traditional rhythms** (Btayhi, Barwal, Draj, Khafif, Msaddar...)
- **Local instruments** (Oud, Kanun, Ney, Violin, Darbouka, Tar, Bendir...)
---
## 📊 Dataset Statistics
| Statistic | Value |
|-----------|-------|
| 💾 Total size | **20.8 GB** |
| 🎵 Number of segments | **11,008 .wav files** |
| ⏱️ Duration per segment | **~30 seconds** |
| 🎚️ Sample rate | **32,000 Hz** |
| 📁 Number of classes | **23 labels** |
| 🗂️ Metadata file | **metadatahaggingface.jsonl** |
---
## 🗂️ Dataset Structure
```
tunisian-malouf-dataset/
├── metadatahaggingface.jsonl ← complete metadata
├── Nuba_Asbahan-RTT/ ← Nuba Asbahan (RTT)
├── Nuba_Dhil-RTT/ ← Nuba Dhil (RTT)
├── Nuba_Iraq-RTT/ ← Nuba Iraq (RTT)
├── Nuba_Raml-RTT/ ← Nuba Raml (RTT)
├── Nuba_Sikah_1993-CMAM/ ← Nuba Sikah CMAM archives 1993
├── Nouba_Hsin-wav-CMAM/ ← Nouba Hsin CMAM archives
├── Awamriyya-CMAM/ ← Awamriyya CMAM
├── Tahar_Gharsa_CMAM/ ← Tahar Gharsa (CMAM)
├── Taher_Gharsa/ ← Taher Gharsa
├── Zied_Gharsa_malouf/ ← Zied Gharsa Malouf
├── zied_gharsa/ ← Zied Gharsa general
├── malouf/ ← Malouf (various artists)
├── malouf_tunisien/ ← Tunisian Malouf general
├── malouf_Tunisien/ ← Tunisian Malouf (variant)
├── malouf_vocal_instrumental/ ← Vocal + instrumental Malouf
├── malouf_instrumental/ ← Pure instrumental Malouf
├── malouf_nouba_classical/ ← Classical Nouba
├── malouf_andalusian/ ← Andalusian Malouf
├── malouf_traditional/ ← Traditional Malouf
├── Oud_solo/ ← Tunisian Oud solo
├── kanun_solo/ ← Qanun solo
├── Ney_solo/ ← Ney solo
└── violan_solo/ ← Violin solo
```
---
## 🏷️ Labels & Descriptions
| Label | Description | Example Artists |
|-------|-------------|-----------------|
| `Nuba_Asbahan-RTT` | Nuba Asbahan recorded by RTT | RTT Ensemble |
| `Nuba_Dhil-RTT` | Nuba Dhil recorded by RTT | RTT Ensemble |
| `Nuba_Iraq-RTT` | Nuba Iraq recorded by RTT | RTT Ensemble |
| `Nuba_Raml-RTT` | Nuba Raml recorded by RTT | RTT Ensemble |
| `Nuba_Sikah_1993-CMAM` | Nuba Sikah — CMAM archives 1993 | CMAM |
| `Nouba_Hsin-wav-CMAM` | Nouba Hsin — CMAM archives | CMAM |
| `Awamriyya-CMAM` | Awamriyya — CMAM archives | CMAM |
| `Tahar_Gharsa_CMAM` | Tahar Gharsa — CMAM recordings | Tahar Gharsa |
| `Taher_Gharsa` | Taher Gharsa — various recordings | Taher Gharsa |
| `Zied_Gharsa_malouf` | Zied Gharsa — Malouf | Zied Gharsa |
| `zied_gharsa` | Zied Gharsa — general | Zied Gharsa |
| `malouf_vocal_instrumental` | Malouf with voice and instruments | Sonia Mbarek, Dorsaf Hamdani, Lotfi Bouchnak |
| `malouf_nouba_classical` | Classical Tunisian Nouba | Classical ensemble |
| `malouf_instrumental` | Pure instrumental Malouf | Various |
| `malouf_tunisien` | General Tunisian Malouf | Various |
| `malouf_andalusian` | Andalusian Malouf (IMA, Monastir, Paris) | Andalusian ensemble |
| `malouf_traditional` | Traditional Malouf | Various |
| `malouf` | Malouf — various artists | Hassan Araibi, Hamdi Benani... |
| `Oud_solo` | Tunisian Oud solo | Zied Gharsa, Anouar Brahem |
| `kanun_solo` | Tunisian Qanun solo | Various |
| `Ney_solo` | Tunisian Ney solo | Various |
| `violan_solo` | Tunisian Violin solo | Anis Klibi, Mehdi Zekri... |
---
## 📋 Metadata Format (JSONL)
Each line of `metadatahaggingface.jsonl` describes one audio segment:
```json
{
"audio": "Nuba_Sikah_1993-CMAM/Nuba_Sikah_1993-CMAM_9_Piste_audio_002.wav",
"text": "Rachidia Khatm final de la Nuba Sikah. Pièce vocale collective rapide et énergique marquant la clôture de la performance. Rythme Barwal vif avec percussions accentuées et chœur à l'unisson.",
"label": "Nuba_Sikah_1993-CMAM",
"mode": "Sika",
"genre": "Malouf Tunisien",
"rythme_iqaa": ["barwal"],
"instruments": ["oud", "violon", "kanun", "ney", "darbouka", "tar", "bendir"],
"date": "1993",
"pays": "tunisie",
"sample_rate": 32000,
"duration": 30
}
```
---
## 🔍 Field-by-Field Explanation
### 🎵 `audio`
- **Type:** `string`
- **Content:** Relative path to the `.wav` audio file within the dataset.
- **Example:** `"Nuba_Sikah_1993-CMAM/Nuba_Sikah_1993-CMAM_9_Piste_audio_002.wav"`
- **Details:** Each file is a **~30-second segment** extracted from a longer original recording. Files are organized by label folder. Used directly as input to MusicGen during fine-tuning via the EnCodec tokenizer which converts raw audio into discrete tokens.
---
### 📝 `text`
- **Type:** `string`
- **Content:** A **rich French textual description** of the audio segment, serving as the text prompt for MusicGen fine-tuning.
- **Example:** `"Rachidia Khatm final de la Nuba Sikah. Pièce vocale collective rapide et énergique..."`
- **Details:** Each description covers the musical context (piece name, performance type), the mood and energy (slow/fast, calm/energetic), the rhythmic pattern, the instrumentation, and the vocal style. This is the **core training signal** — MusicGen learns to associate this text with the corresponding audio.
---
### 🏷️ `label`
- **Type:** `string`
- **Content:** The **class/category** of the audio segment — corresponds to the folder name.
- **All 23 possible values:** `Nuba_Asbahan-RTT`, `Nuba_Dhil-RTT`, `Nuba_Iraq-RTT`, `Nuba_Raml-RTT`, `Nuba_Sikah_1993-CMAM`, `Nouba_Hsin-wav-CMAM`, `Awamriyya-CMAM`, `Tahar_Gharsa_CMAM`, `Taher_Gharsa`, `Zied_Gharsa_malouf`, `zied_gharsa`, `malouf`, `malouf_tunisien`, `malouf_Tunisien`, `malouf_vocal_instrumental`, `malouf_instrumental`, `malouf_nouba_classical`, `malouf_andalusian`, `malouf_traditional`, `Oud_solo`, `kanun_solo`, `Ney_solo`, `violan_solo`
---
### 🎼 `mode`
- **Type:** `string`
- **Content:** The **musical mode (maqam)** — the tonal scale system of Arab music, equivalent to a key but with specific melodic rules and emotional character.
| Value | Arabic | Character |
|-------|--------|-----------|
| `Sika` | سيكا | Soft, expressive, intimate |
| `Hsin` | حسين | Nostalgic, melancholic |
| `Raml` | رمل | Light, fluid, flowing |
| `Raml Maya` | رمل الماية | Gentle, lyrical |
| `Iraq` | عراق | Solemn, ceremonial |
| `Dhil` | ذيل | Deep, contemplative |
| `Asbahan` | أصبهان | Joyful, bright |
| `Asbaayn` | أصبعين | Variant of Asbahan |
| `Muhayyar` | محير | Suspended, wondering |
| `Maya` | ماية | Ancient, sacred |
---
### 🎭 `genre`
- **Type:** `string`
- **Content:** The **broad musical genre** of the segment.
| Value | Description |
|-------|-------------|
| `Malouf Tunisien` | Classical Tunisian Malouf — main genre of this dataset |
| `Malouf Andalou` | Andalusian-origin Malouf brought to Tunisia after 1492 |
| `Oud Solo Tunisien` | Solo Oud in Tunisian tradition |
| `Kanun Solo Tunisien` | Solo Qanun performance |
| `Ney Solo Tunisien` | Solo Ney (flute) performance |
| `Violon Solo Tunisien` | Solo Violin in Tunisian style |
---
### 🥁 `rythme_iqaa`
- **Type:** `list of strings`
- **Content:** List of **rhythmic patterns (Iqa'at)** present in the segment. Iqa'at are cyclic rhythmic structures defining the metric organization of Arab music — similar to time signatures but with specific drum stroke patterns.
| Value | Description | Feel |
|-------|-------------|------|
| `btayhi` | Slow, majestic — 8 beats | Slow & solemn |
| `msaddar` | Opening movement rhythm | Moderate |
| `barwal` | Fast closing rhythm | Fast & festive |
| `draj` | Medium tempo, flowing | Moderate & fluid |
| `khafif` | Light, quick rhythm | Light & rapid |
| `tushiya` | Instrumental prelude rhythm | Introductory |
| `istiftah` | Opening/introductory rhythm | Ceremonial |
| `abyat` | Poetic verse rhythm | Declamatory |
| `khatm` | Closing/finale rhythm | Final & conclusive |
---
### 🎻 `instruments`
- **Type:** `list of strings`
- **Content:** List of **instruments identified** in the audio segment.
| Value | Instrument | Role |
|-------|------------|------|
| `oud` | Oud (عود) | Plucked lute — melodic backbone |
| `violon` | Violin | Bowed string — melodic lead |
| `kanun` | Qanun (قانون) | Plucked zither — harmonic/melodic |
| `ney` | Ney (ناي) | End-blown flute — melodic |
| `darbouka` | Darbouka (دربوكة) | Goblet drum — rhythmic |
| `tar` | Tar (طار) | Frame drum — rhythmic |
| `bendir` | Bendir (بندير) | Frame drum with snare — rhythmic |
---
### 📅 `date`
- **Type:** `string`
- **Content:** The **year of the original recording** (not the segmentation date).
- **Range in dataset:** `"1984"` to `"2025"` — covering 40+ years of Tunisian musical heritage.
- **Examples:** `"1984"`, `"1993"`, `"2004"`, `"2017"`, `"2019"`, `"2021"`, `"2022"`, `"2024"`, `"2025"`
---
### 🌍 `pays`
- **Type:** `string`
- **Content:** The **country of origin** of the recording.
- **Main value:** `"tunisie"`. Some international performances (Paris, Monastir festivals) are tagged accordingly.
---
### 🎚️ `sample_rate`
- **Type:** `integer`
- **Content:** The **audio sampling rate** in Hz.
- **Value:** Always `32000` (32 kHz) — the standard rate required by MusicGen's EnCodec tokenizer. All original recordings were resampled to this rate during preprocessing using `torchaudio` / `scipy`.
---
### ⏱️ `duration`
- **Type:** `float`
- **Content:** The **duration of the audio segment** in seconds.
- **Values:** Mostly `30.0`. Some final segments may be shorter (e.g., `0.39`, `1.30`, `15.0`).
- **Details:** Original recordings were split into fixed 30-second segments using a sliding window. This corresponds to `T_MAX = 1500 frames` at 50 frames/second in MusicGen's internal representation. Shorter final segments were kept to preserve all musical content.
---
## ✍️ Citation
```bibtex
@dataset{ghanmi2025tunisian,
author = {Malek Ghanmi},
title = {Tunisian Malouf Music Dataset},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/datasets/malekghanmi/tunisian-malouf-dataset}
}
```
# 🎵 突尼斯马鲁夫音乐数据集(Tunisian Malouf Music Dataset)
> **首个开源带标注的突尼斯音乐数据集**,专为生成式音乐模型的微调任务打造,尤其适配Meta AI推出的MusicGen。
---
## 📖 项目背景
本数据集源于一项聚焦将MusicGen模型适配突尼斯音乐的本科毕业设计(Projet de Fin d'Études,简称PFA)。
本项目的目标是让人工智能模型能够根据文本描述生成地道的突尼斯音乐,同时严格遵循以下核心要素:
- **马卡姆调式(Maqamat,阿拉伯音乐调式)**:包括Sika、Hsin、Raml、Iraq、Dhil、Asbahan等
- **传统节奏型**:包括Btayhi、Barwal、Draj、Khafif、Msaddar等
- **本土传统乐器**:包括乌德琴(Oud)、卡农琴(Kanun)、奈伊笛(Ney)、小提琴、达布卡鼓(Darbouka)、塔尔鼓(Tar)、本迪尔鼓(Bendir)等
---
## 📊 数据集统计
| 统计项 | 数值 |
|-----------|-------|
| 💾 总存储容量 | **20.8 GB** |
| 🎵 音频片段总数 | **11,008 个 .wav 音频文件** |
| ⏱️ 单片段时长 | **约30秒** |
| 🎚️ 采样率 | **32,000 Hz** |
| 📁 类别总数 | **23个标签** |
| 🗂️ 元数据文件 | **metadatahaggingface.jsonl** |
---
## 🗂️ 数据集结构
tunisian-malouf-dataset/
├── metadatahaggingface.jsonl ← 完整元数据文件
├── Nuba_Asbahan-RTT/ ← RTT录制的Nuba Asbahan
├── Nuba_Dhil-RTT/ ← RTT录制的Nuba Dhil
├── Nuba_Iraq-RTT/ ← RTT录制的Nuba Iraq
├── Nuba_Raml-RTT/ ← RTT录制的Nuba Raml
├── Nuba_Sikah_1993-CMAM/ ← 1993年CMAM档案馆藏Nuba Sikah
├── Nouba_Hsin-wav-CMAM/ ← CMAM档案馆藏Nouba Hsin
├── Awamriyya-CMAM/ ← CMAM档案馆藏Awamriyya
├── Tahar_Gharsa_CMAM/ ← CMAM馆藏Tahar Gharsa录音
├── Taher_Gharsa/ ← Taher Gharsa作品合集
├── Zied_Gharsa_malouf/ ← Zied Gharsa马鲁夫作品
├── zied_gharsa/ ← Zied Gharsa通用作品集
├── malouf/ ← 马鲁夫(多位艺人合集)
├── malouf_tunisien/ ← 通用突尼斯马鲁夫
├── malouf_Tunisien/ ← 突尼斯马鲁夫(变体)
├── malouf_vocal_instrumental/ ← 人声+器乐马鲁夫
├── malouf_instrumental/ ← 纯器乐马鲁夫
├── malouf_nouba_classical/ ← 古典努巴马鲁夫
├── malouf_andalusian/ ← 安达卢西亚马鲁夫
├── malouf_traditional/ ← 传统马鲁夫
├── Oud_solo/ ← 突尼斯乌德琴独奏
├── kanun_solo/ ← 卡农琴独奏
├── Ney_solo/ ← 奈伊笛独奏
└── violan_solo/ ← 小提琴独奏
---
## 🏷️ 标签与类别说明
| 类别标签 | 类别说明 | 代表艺人 |
|-------|-------------|-----------------|
| `Nuba_Asbahan-RTT` | RTT录制的Nuba Asbahan | RTT乐团 |
| `Nuba_Dhil-RTT` | RTT录制的Nuba Dhil | RTT乐团 |
| `Nuba_Iraq-RTT` | RTT录制的Nuba Iraq | RTT乐团 |
| `Nuba_Raml-RTT` | RTT录制的Nuba Raml | RTT乐团 |
| `Nuba_Sikah_1993-CMAM` | 1993年CMAM档案馆藏Nuba Sikah | CMAM档案馆 |
| `Nouba_Hsin-wav-CMAM` | CMAM档案馆藏Nouba Hsin | CMAM档案馆 |
| `Awamriyya-CMAM` | CMAM档案馆藏Awamriyya | CMAM档案馆 |
| `Tahar_Gharsa_CMAM` | CMAM馆藏Tahar Gharsa录音 | Tahar Gharsa |
| `Taher_Gharsa` | Taher Gharsa多版录音 | Taher Gharsa |
| `Zied_Gharsa_malouf` | Zied Gharsa马鲁夫作品 | Zied Gharsa |
| `zied_gharsa` | Zied Gharsa通用作品集 | Zied Gharsa |
| `malouf_vocal_instrumental` | 人声与器乐结合的马鲁夫 | Sonia Mbarek、Dorsaf Hamdani、Lotfi Bouchnak |
| `malouf_nouba_classical` | 突尼斯古典努巴马鲁夫 | 古典乐团 |
| `malouf_instrumental` | 纯器乐马鲁夫 | 多位艺人 |
| `malouf_tunisien` | 通用突尼斯马鲁夫 | 多位艺人 |
| `malouf_andalusian` | 安达卢西亚马鲁夫(IMA、莫纳斯提尔、巴黎演出) | 安达卢西亚乐团 |
| `malouf_traditional` | 传统马鲁夫 | 多位艺人 |
| `malouf` | 马鲁夫(多位艺人合集) | Hassan Araibi、Hamdi Benani等 |
| `Oud_solo` | 突尼斯乌德琴独奏 | Zied Gharsa、Anouar Brahem |
| `kanun_solo` | 突尼斯卡农琴独奏 | 多位艺人 |
| `Ney_solo` | 突尼斯奈伊笛独奏 | 多位艺人 |
| `violan_solo` | 突尼斯风格小提琴独奏 | Anis Klibi、Mehdi Zekri等 |
---
## 📋 元数据格式(JSONL)
`metadatahaggingface.jsonl`的每一行对应一个音频片段的元数据:
json
{
"audio": "Nuba_Sikah_1993-CMAM/Nuba_Sikah_1993-CMAM_9_Piste_audio_002.wav",
"text": "Rachidia Khatm final de la Nuba Sikah. Pièce vocale collective rapide et énergique marquant la clôture de la performance. Rythme Barwal vif avec percussions accentuées et chœur à l'unisson.",
"label": "Nuba_Sikah_1993-CMAM",
"mode": "Sika",
"genre": "Malouf Tunisien",
"rythme_iqaa": ["barwal"],
"instruments": ["oud", "violon", "kanun", "ney", "darbouka", "tar", "bendir"],
"date": "1993",
"pays": "tunisie",
"sample_rate": 32000,
"duration": 30
}
---
## 🔍 字段详细说明
### 🎵 `audio` 字段
- **类型:** 字符串
- **内容:** 数据集内.wav音频文件的相对路径
- **示例:** "Nuba_Sikah_1993-CMAM/Nuba_Sikah_1993-CMAM_9_Piste_audio_002.wav"
- **细节:** 每个文件为从完整原始录音中截取的约30秒片段,按类别文件夹组织。微调MusicGen时,将通过EnCodec分词器将原始音频转换为离散Token,直接以此作为模型输入。
---
### 📝 `text` 字段
- **类型:** 字符串
- **内容:** 音频片段的详细法语文本说明,作为MusicGen微调的文本提示词
- **示例:** "Rachidia Khatm final de la Nuba Sikah. Pièce vocale collective rapide et énergique marquant la clôture de la performance. Rythme Barwal vif avec percussions accentuées et chœur à l'unisson."
- **细节:** 每条描述涵盖音乐背景(曲目名称、表演形式)、情绪与能量(舒缓/急促、平静/激昂)、节奏型、乐器配置与演唱风格,是模型训练的核心监督信号——MusicGen将学习将此类文本与对应音频建立关联。
---
### 🏷️ `label` 字段
- **类型:** 字符串
- **内容:** 音频片段的所属分类,与数据集文件夹名称对应
- **全部23种可选类别:** `Nuba_Asbahan-RTT`、`Nuba_Dhil-RTT`、`Nuba_Iraq-RTT`、`Nuba_Raml-RTT`、`Nuba_Sikah_1993-CMAM`、`Nouba_Hsin-wav-CMAM`、`Awamriyya-CMAM`、`Tahar_Gharsa_CMAM`、`Taher_Gharsa`、`Zied_Gharsa_malouf`、`zied_gharsa`、`malouf`、`malouf_tunisien`、`malouf_Tunisien`、`malouf_vocal_instrumental`、`malouf_instrumental`、`malouf_nouba_classical`、`malouf_andalusian`、`malouf_traditional`、`Oud_solo`、`kanun_solo`、`Ney_solo`、`violan_solo`
---
### 🎼 `mode` 字段
- **类型:** 字符串
- **内容:** 音乐调式(maqamat,阿拉伯音乐的音高体系,兼具特定旋律规则与情感特质,近似西方音乐的调式但更强调旋律特性)
| 调式名称 | 阿拉伯语原名 | 情感特质 |
|-------|--------|-----------|
| `Sika` | سيكا | 柔和细腻、富有表现力且私密 |
| `Hsin` | حسين | 怀旧感伤 |
| `Raml` | رمل | 轻快流畅 |
| `Raml Maya` | رمل الماية | 温婉抒情 |
| `Iraq` | عراق | 庄重肃穆 |
| `Dhil` | ذيل | 深沉沉静 |
| `Asbahan` | أصبهان | 欢快明亮 |
| `Asbaayn` | أصبعين | 阿斯巴汉的变体 |
| `Muhayyar` | محير | 悬宕疑惑 |
| `Maya` | ماية | 古朴神圣 |
---
### 🎭 `genre` 字段
- **类型:** 字符串
- **内容:** 音频片段的宽泛音乐流派
| 流派名称 | 流派说明 |
|-------|-------------|
| `Malouf Tunisien` | 突尼斯古典马鲁夫,本数据集的核心流派 |
| `Malouf Andalou` | 安达卢西亚马鲁夫,1492年之后传入突尼斯的安达卢西亚流派 |
| `Oud Solo Tunisien` | 突尼斯传统乌德琴独奏 |
| `Kanun Solo Tunisien` | 突尼斯传统卡农琴独奏 |
| `Ney Solo Tunisien` | 突尼斯传统奈伊笛独奏 |
| `Violon Solo Tunisien` | 突尼斯风格小提琴独奏 |
---
### 🥁 `rythme_iqaa` 字段
- **类型:** 字符串列表
- **内容:** 音频片段使用的节奏型列表(Iqa'at,阿拉伯音乐的循环节拍结构,类似西方的节拍记号但带有特定的鼓点演奏模式)
| 节奏型名称 | 节奏说明 | 听觉感受 |
|-------|-------------|------|
| `btayhi` | 缓慢庄重,8拍 | 缓慢肃穆 |
| `msaddar` | 开篇段落节奏型 | 中等速度 |
| `barwal` | 快速收尾节奏型 | 欢快急促 |
| `draj` | 中等速度流畅节奏 | 舒缓流畅 |
| `khafif` | 轻快短促节奏 | 轻盈迅捷 |
| `tushiya` | 器乐前奏曲节奏 | 前奏氛围 |
| `istiftah` | 开篇/引入节奏 | 庄重仪式感 |
| `abyat` | 诗歌吟诵节奏 | 吟诵式 |
| `khatm` | 收尾/终章节奏 | 终结感 |
---
### 🎻 `instruments` 字段
- **类型:** 字符串列表
- **内容:** 音频片段中识别到的乐器列表
| 乐器名称 | 乐器说明 | 演奏角色 |
|-------|------------|------|
| `oud` | 乌德琴(Oud) | 拨弦鲁特琴,作为旋律骨架 |
| `violon` | 小提琴 | 弓弦乐器,担任旋律主奏 |
| `kanun` | 卡农琴(Qanun) | 拨弦齐特琴,负责和声与旋律铺垫 |
| `ney` | 奈伊笛(Ney) | 边吹长笛,担任旋律声部 |
| `darbouka` | 达布卡鼓(Darbouka) | 高脚杯鼓,负责节奏律动 |
| `tar` | 塔尔鼓(Tar) | 框架鼓,负责节奏 |
| `bendir` | 本迪尔鼓(Bendir) | 带响弦的框架鼓,负责节奏 |
---
### 📅 `date` 字段
- **类型:** 字符串
- **内容:** 原始录音的录制年份(非音频片段截取年份)
- **数据集覆盖范围:** "1984" 至 "2025",涵盖40余年的突尼斯音乐遗产
- **示例:** "1984"、"1993"、"2004"、"2017"、"2019"、"2021"、"2022"、"2024"、"2025"
---
### 🌍 `pays` 字段
- **类型:** 字符串
- **内容:** 录音的起源国家
- **主要取值:** "tunisie"(突尼斯),部分在国际场合(如巴黎、莫纳斯提尔音乐节)的演出录音会标注对应地区。
---
### 🎚️ `sample_rate` 字段
- **类型:** 整数
- **内容:** 音频的采样频率,单位为赫兹
- **统一取值:** 始终为`32000`(32 kHz),为MusicGen的EnCodec分词器要求的标准采样率,所有原始录音均通过`torchaudio`/`scipy`预处理至该采样率。
---
### ⏱️ `duration` 字段
- **类型:** 浮点数
- **内容:** 音频片段的时长,单位为秒
- **常见取值:** 绝大多数为`30.0`秒,部分收尾片段时长较短(如`0.39`、`1.30`、`15.0`)
- **细节:** 原始录音通过滑动窗口截取为固定30秒片段,对应MusicGen内部表示中`T_MAX = 1500`帧(帧率为50帧/秒),较短的收尾片段予以保留以完整保留音乐内容。
---
## 📋 引用格式
bibtex
@dataset{ghanmi2025tunisian,
author = {Malek Ghanmi},
title = {Tunisian Malouf Music Dataset},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/datasets/malekghanmi/tunisian-malouf-dataset}
}
提供机构:
malekghanmi



