Audio-FLAN-Dataset
收藏魔搭社区2026-04-29 更新2026-05-03 收录
下载链接:
https://modelscope.cn/datasets/AI-ModelScope/Audio-FLAN-Dataset
下载链接
链接失效反馈官方服务:
资源简介:
# Audio-FLAN Dataset ([Paper](https://arxiv.org/abs/2502.16584))
(the FULL audio files and jsonl files are still updating)
An Instruction-Tuning Dataset for Unified Audio Understanding and Generation Across Speech, Music, and Sound.
## Table of Contents
1. [Dataset Structure](#1-dataset-structure)
2. [Directories and Files](#2-directories-and-files)
3. [Metadata Example (in JSONL)](#3-metadata-example-in-jsonl)
4. [Accessing Audio Files](#4-accessing-audio-files)
5. [An Example to Accessing Metadata and Audio Together](#5-an-example-to-accessing-metadata-and-audio-together)
## 1. Dataset Structure
The **Audio-FLAN-Dataset** has the following directory structure:
```
Audio-FLAN-Dataset/
├── audio_files/
│ ├── audio/
│ │ └── 177_TAU_Urban_Acoustic_Scenes_2022/
│ │ └── 179_Audioset_for_Audio_Inpainting/
│ │ └── ...
│ ├── music/
│ │ └── 08_m4singer/
│ │ └── 12_Opensinger/
│ │ └── ...
│ └── speech/
│ │ └── ...
├── metadata/
│ ├── audio/
│ │ ├── generation/
│ │ │ ├── test/
│ │ │ └── train_dev/
│ │ └── understanding/
│ │ │ ├── test/
│ │ │ │ └── xxx/jsonl/xxx.jsonl
│ │ │ │ └── xxx/jsonl/xxx.jsonl
│ │ │ └── train_dev/
│ │ │ └── 177_TAU_Urban_Acoustic_Scenes_2022/jsonl/xxx.jsonl
│ │ │ └── ...
│ ├── music/
│ │ ├── generation/
│ │ │ ├── test/
│ │ │ └── train_dev/
│ │ └── understanding/
│ │ │ ├── test/
│ │ │ └── train_dev/
│ └── speech/
│ │ ├── generation/
│ │ │ ├── test/
│ │ │ └── train_dev/
│ │ └── understanding/
│ │ │ ├── test/
│ │ │ └── train_dev/
├── scp_files/
│ ├── audio/
│ │ ├── 177_TAU_Urban_Acoustic_Scenes_2022.scp
│ │ ├── 179_Audioset_for_Audio_Inpainting.scp
│ ├── music/
│ │ ├── 08_m4singer.scp
│ │ ├── 12_Opensinger.scp
│ │ └── ...
│ └── speech/
│ └── ...
```
## 2. Directories and Files:
- **audio_files**: Contains the actual audio files organized in subfolders (`audio/`, `music/`, `speech/`) for each dataset.
- **metadata**: Contains task-specific JSON Lines files (`.jsonl`) for each dataset with organized in subfolders (`audio/`, `music/`, `speech/`).
- **scp_files**: Contains `.scp` files that map `Audio_ID` (in JSON Lines) to `audio_path`, making it easy to locate corresponding audio files.
## 3. Metadata Example (in JSONL):
The metadata contains the following fields for each audio file:
```json
{
"instruction": "Could you identify the possible location where this audio was recorded?",
"input": "audio data: <|SOA|>177_TAU_Urban_Acoustic_Scenes_2022_airport-lisbon-1000-40000-1-a<|EOA|>",
"output": "recording location: airport",
"uuid": "177_TAU_Urban_Acoustic_Scenes_2022_90bc313021c880d2",
"split": ["train"],
"task_type": {
"major": ["Audio Event Recognition"],
"minor": ["Acoustic Scene Classification"],
"U/G": ["understanding"],
"unseen": false
},
"domain": "audio",
"source": "unknown",
"other": null
}
```
### 3.1 Audio_ID in JSONL line
Note: the tags between `<|SOA|>` and `<|EOA|>` is `Audio_ID`. For example,
```bash
<|SOA|>177_TAU_Urban_Acoustic_Scenes_2022_airport-lisbon-1000-40000-1-a<|EOA|>
```
* Audio_ID: `177_TAU_Urban_Acoustic_Scenes_2022_airport-lisbon-1000-40000-1-a`
* Dataset ID: `177`
* Dataset Name: `TAU_Urban_Acoustic_Scenes_2022`
* Audio File Name: `airport-lisbon-1000-40000-1-a`
### 3.2 Description of JSON line:
```json
{
"instruction": "This field provides the instructions for the task, outlining the specific operation to be performed.",
"input": "This field contains the input data for the task, which represents the raw information to be processed.",
"output": "This field represents the expected result or outcome after processing the input data.",
"uuid": "This field assigns a unique identifier to each task instance, enabling the system to track and manage individual tasks.",
"split": ["This field specifies the dataset partition for the task, such as 'train', 'test', or 'dev', which correspond to the training, testing, and development datasets, respectively."],
"task_type": {
"major": ["This field indicates the primary category of the task."],
"minor": ["This field specifies the secondary or more specific task."],
"U/G": ["This field distinguishes whether the task focuses on generation or understanding."],
"unseen": "A boolean value that indicates whether the task involves data that has not been encountered before."
},
"domain": "This field defines the domain in which the task is situated, such as 'speech', 'music', or 'audio'.",
"source": "This field identifies the origin of the audio, such as 'audiobook', 'youtube', or 'studio', signifying where the audio signal is sourced from.",
"other": "This field can store any additional metadata relevant to the task, if applicable."
}
```
## 4. Accessing Audio Files
Each audio file's path is stored in the `.scp` files located in the `scp_files` directory. You can use the `Audio_ID` to locate the corresponding audio file in the `audio_files` directory.
For example, the file `scp_files/audio/177_TAU_Urban_Acoustic_Scenes_2022.scp` contains:
```
177_TAU_Urban_Acoustic_Scenes_2022_airport-lisbon-1000-40000-0-a audio_files/audio/177_TAU_Urban_Acoustic_Scenes_2022/TAU-urban-acoustic-scenes-2022-mobile-development/audio/airport-lisbon-1000-40000-0-a.wav
```
- **Audio_ID**: `177_TAU_Urban_Acoustic_Scenes_2022_airport-lisbon-1000-40000-0-a`
- **audio_path**: `audio_files/audio/177_TAU_Urban_Acoustic_Scenes_2022/TAU-urban-acoustic-scenes-2022-mobile-development/audio/airport-lisbon-1000-40000-0-a.wav`
### 5. An Example to Accessing Metadata and Audio Together:
To make it easier, we provide `example.py` to map `Audio_ID` and `audio_path`.
<!-- links the `metadata` and `audio_files` through `scp_files`. -->
## 6. Licensing and Redistribution
The Audio-FLAN dataset integrates more than 50 datasets across speech, music, and audio domains. We strictly comply with the license terms of all source datasets.
- ✅ **Audio is included** only for datasets with permissive licenses such as CC-BY, Apache-2.0, or MIT.
- 🚫 **Audio is excluded** from this release for datasets under restrictive or unclear licenses (e.g., CC-BY-NC, research-only, YouTube-based). Only metadata and task templates are retained for reproducibility.
Please refer to [`LICENSES.md`](./LICENSES.md) for a detailed breakdown of included and excluded datasets, as well as their redistribution policies.
> **Important:** Users are responsible for ensuring their use of the dataset complies with the original licenses of each component. We do not claim ownership of the original data and provide redistribution only where explicitly permitted.
### Disclaimer!
The Audio-FLAN dataset is released under the same license as the original datasets used in its creation. Please refer to the respective licenses of the original datasets for usage and redistribution terms. We do not claim ownership of the original data and encourage users to comply with the licensing terms of the source materials.
# Audio-FLAN 数据集([论文](https://arxiv.org/abs/2502.16584))
(完整音频文件与JSON Lines(JSONL)文件仍在更新中)
面向语音、音乐与声音领域的统一音频理解与生成的指令微调数据集。
## 目录
1. [数据集结构](#1-dataset-structure)
2. [目录与文件说明](#2-directories-and-files)
3. [元数据示例(JSONL格式)](#3-metadata-example-in-jsonl)
4. [音频文件访问方法](#4-accessing-audio-files)
5. [元数据与音频联合访问示例](#5-an-example-to-accessing-metadata-and-audio-together)
## 1. 数据集结构
**Audio-FLAN数据集**的目录结构如下:
Audio-FLAN-Dataset/
├── audio_files/
│ ├── audio/
│ │ └── 177_TAU_Urban_Acoustic_Scenes_2022/
│ │ └── 179_Audioset_for_Audio_Inpainting/
│ │ └── ...
│ ├── music/
│ │ └── 08_m4singer/
│ │ └── 12_Opensinger/
│ │ └── ...
│ └── speech/
│ │ └── ...
├── metadata/
│ ├── audio/
│ │ ├── generation/
│ │ │ ├── test/
│ │ │ └── train_dev/
│ │ └── understanding/
│ │ │ ├── test/
│ │ │ │ └── xxx/jsonl/xxx.jsonl
│ │ │ │ └── xxx/jsonl/xxx.jsonl
│ │ │ └── train_dev/
│ │ │ └── 177_TAU_Urban_Acoustic_Scenes_2022/jsonl/xxx.jsonl
│ │ │ └── ...
│ ├── music/
│ │ ├── generation/
│ │ │ ├── test/
│ │ │ └── train_dev/
│ │ └── understanding/
│ │ │ ├── test/
│ │ │ └── train_dev/
│ └── speech/
│ │ ├── generation/
│ │ │ ├── test/
│ │ │ └── train_dev/
│ │ └── understanding/
│ │ │ ├── test/
│ │ │ └── train_dev/
├── scp_files/
│ ├── audio/
│ │ ├── 177_TAU_Urban_Acoustic_Scenes_2022.scp
│ │ ├── 179_Audioset_for_Audio_Inpainting.scp
│ ├── music/
│ │ ├── 08_m4singer.scp
│ │ ├── 12_Opensinger.scp
│ │ └── ...
│ └── speech/
│ └── ...
## 2. 目录与文件说明
- **audio_files**:存储实际音频文件,按子文件夹(`audio/`、`music/`、`speech/`)对各数据集进行分类整理。
- **metadata**:存储各数据集的任务专属JSON Lines(JSONL)文件,按`audio/`、`music/`、`speech/`子文件夹组织。
- **scp_files**:存储`.scp`格式文件,用于将JSONL文件中的`Audio_ID`(音频标识符)映射至`audio_path`(音频路径),便于快速定位对应音频文件。
## 3. 元数据示例(JSONL格式)
每个音频文件的元数据包含以下字段:
json
{
"instruction": "Could you identify the possible location where this audio was recorded?",
"input": "audio data: <|SOA|>177_TAU_Urban_Acoustic_Scenes_2022_airport-lisbon-1000-40000-1-a<|EOA|>",
"output": "recording location: airport",
"uuid": "177_TAU_Urban_Acoustic_Scenes_2022_90bc313021c880d2",
"split": ["train"],
"task_type": {
"major": ["Audio Event Recognition"],
"minor": ["Acoustic Scene Classification"],
"U/G": ["understanding"],
"unseen": false
},
"domain": "audio",
"source": "unknown",
"other": null
}
### 3.1 JSONL行中的Audio_ID(音频标识符)
注意:`<|SOA|>`与`<|EOA|>`之间的标签即为`Audio_ID`。例如:
bash
<|SOA|>177_TAU_Urban_Acoustic_Scenes_2022_airport-lisbon-1000-40000-1-a<|EOA|>
* Audio_ID:`177_TAU_Urban_Acoustic_Scenes_2022_airport-lisbon-1000-40000-1-a`
* 数据集ID(Dataset ID):`177`
* 数据集名称(Dataset Name):`TAU_Urban_Acoustic_Scenes_2022`
* 音频文件名(Audio File Name):`airport-lisbon-1000-40000-1-a`
### 3.2 JSON行字段说明
各字段的含义如下:
json
{
"instruction": "该字段提供任务指令,概述需执行的具体操作。",
"input": "该字段包含任务的输入数据,代表待处理的原始信息。",
"output": "该字段代表处理输入数据后的预期结果或输出。",
"uuid": "该字段为每个任务实例分配唯一标识符,便于系统跟踪与管理单个任务。",
"split": "该字段指定任务的数据集划分,例如`['train']`、`['test']`或`['dev']`,分别对应训练集、测试集与开发集。",
"task_type": {
"major": "该字段指示任务的主类别。",
"minor": "该字段指定任务的次级或更具体的分类。",
"U/G": "该字段区分任务侧重于生成还是理解。",
"unseen": "布尔值,用于指示任务是否涉及从未见过的数据。"
},
"domain": "该字段定义任务所属的领域,例如`'speech'`(语音)、`'music'`(音乐)或`'audio'`(通用音频)。",
"source": "该字段标识音频的来源,例如`'audiobook'`(有声书)、`'youtube'`或`'studio'`,代表音频信号的获取渠道。",
"other": "该字段可存储与任务相关的任何额外元数据(如适用)。"
}
## 4. 音频文件访问方法
每个音频文件的路径存储在`scp_files`目录下的`.scp`文件中。您可通过`Audio_ID`在`audio_files`目录中定位对应音频文件。
例如,文件`scp_files/audio/177_TAU_Urban_Acoustic_Scenes_2022.scp`包含如下内容:
177_TAU_Urban_Acoustic_Scenes_2022_airport-lisbon-1000-40000-0-a audio_files/audio/177_TAU_Urban_Acoustic_Scenes_2022/TAU-urban-acoustic-scenes-2022-mobile-development/audio/airport-lisbon-1000-40000-0-a.wav
- **Audio_ID**:`177_TAU_Urban_Acoustic_Scenes_2022_airport-lisbon-1000-40000-0-a`
- **audio_path**:`audio_files/audio/177_TAU_Urban_Acoustic_Scenes_2022/TAU-urban-acoustic-scenes-2022-mobile-development/audio/airport-lisbon-1000-40000-0-a.wav`
### 5. 元数据与音频联合访问示例
为便于使用,我们提供`example.py`以实现`Audio_ID`与`audio_path`的映射。(通过`scp_files`实现`metadata`与`audio_files`的关联。)
## 6. 许可与再分发
Audio-FLAN数据集整合了50余个覆盖语音、音乐与音频领域的数据集。我们严格遵循所有源数据集的许可条款。
- ✅ **仅对采用CC-BY、Apache-2.0或MIT等宽松许可的数据集,才会包含其音频文件**。
- 🚫 **对于采用限制性或不明确许可(如CC-BY-NC、仅用于研究、基于YouTube的数据集),本次发布将不包含其音频文件,仅保留元数据与任务模板以保证可复现性**。
请参考[`LICENSES.md`](./LICENSES.md)了解已包含与排除的数据集详情,以及各自的再分发政策。
> **重要提示:** 用户需确保数据集的使用符合各组件的原始许可条款。我们不主张对原始数据的所有权,仅在明确允许的范围内提供再分发。
### 免责声明
Audio-FLAN数据集的许可与构建所用的原始数据集一致。请参考各原始数据集的许可条款以了解使用与再分发规则。我们不主张对原始数据的所有权,鼓励用户遵守源材料的许可条款。
提供机构:
maas
创建时间:
2025-02-24



