NSynth_fscil
收藏魔搭社区2026-01-04 更新2024-05-15 收录
下载链接:
https://modelscope.cn/datasets/pp199124903/NSynth-100
下载链接
链接失效反馈官方服务:
资源简介:
### Clone with HTTP
```bash
git clone https://www.modelscope.cn/datasets/pp199124903/NSynth-100.git
```
% - 2022.05.25 by Chester.W.Xie - ASVP@SCUT
- [The NSynth Dataset](https://magenta.tensorflow.org/datasets/nsynth) is a dataset for musical instrument identification.
- The dataset contains 305,979 samples (musical notes), each with a unique pitch, timbre, and envelope. Each sample has a unique pitch, timbre, and envelope.
samples were generated from 1006 musical instruments. The duration of the samples is standardized at 4 seconds and the sampling rate is standardized at 16K Hz.
➢ The dataset has been divided into three subsets:
- The training set has 953 classes, totaling 289,205 samples. The training set has 953 categories with a total of 289,205 samples. The maximum number of samples within a category is 440, the minimum is 30, and the average is 303;
- The validation set has 53 categories, totaling 12,678 samples. Maximum number of samples within a class: 348, minimum 83, average 239;
- The test set has 53 categories with a total of 4096 samples. Maximum number of samples within a class: 125, minimum 22, average 125;
- The validation set and test set have the same categories, while the training set does not overlap with the validation/test set categories.
➢ Statistics on number of samples after validation and test sets are combined.
- Number of categories with >100 samples: 53
- Number of categories with >150 samples: 52
- Number of categories with >200 samples: 45
- Number of categories with more than 250 samples: 32
- Number of categories with >300 samples: 31
Counting the samples in the training set
- Number of categories with >450 samples: 0
- Number of categories with >440 samples: 377
- Number of categories with more than 400 samples: 382
➢ Based on the above statistics, we can have the following options for reorganizing the dataset:
#### setup 1 (NSynth-100-FS).
After merging the test set and validation set, keep those 45 classes that have greater than 200 samples as new classes, and then keep 200 samples per class uniformly. \
Then further divide the 200 samples in each class in half into the training set and the test set;\
From the 382 classes in the training set, 55 classes are taken as the base class and 400 samples in each class are retained, and further divided randomly into training, validation and test sets by 2:1:1
#### setup 2 (NSynth-200-FS).
The training and test sets of the new classes are kept the same as setup 1. \
From the 382 classes in the training set, 155 classes are taken as the base class, 400 samples are kept for each class, and further divided into training, validation and test sets randomly by 2:1:1.
#### setup 3 (NSynth-300-FS).
The training set and test set for the new classes are kept the same as setup 1\
From the 382 classes in the training set, 255 classes are taken as the base class, 400 samples are reserved for each class, and further divided into training, validation and test sets randomly by 2:1:1.
#### setup 4 (NSynth-400-FS).
The training set and test set for the new classes are kept the same as setup 1 \
355 categories are extracted from the 382 categories of the training set as the base category, 400 samples are kept for each category, and further divided into training, validation and test sets randomly by 2:1:1
Our reorganization strategy:
- Keep the original audio sample catalog structure unchanged;
- The original meta files are filtered accordingly to obtain the meta files corresponding to the four settings;
- According to the new meta file information, read the corresponding data from the original sample catalog.
Please cite the following papers when you use the datasets in your work.
[1] Y. Li, W. Cao, W. Xie, J. Li and E. Benetos, "Few-Shot Class-Incremental Audio Classification Using Dynamically Expanded Classifier With Self-Attention Modified Prototypes," in IEEE Transactions on Multimedia, vol. 26, pp. 1346-1360, 2024, doi: 10.1109/TMM.2023.3280011.
[2] W. Xie, Y. Li, Q. He, W.g Cao, Few-shot class-incremental audio classification via discriminative prototype learning, Expert Systems With Applications, 2023, vol. 225, 120044, pp. 1-13.
[3] W. Xie, Y. Li, Q. He, W. Cao, T. Virtanen, Few-shot class-incremental audio classification using adaptively-refined prototypes, INTERSPEECH, 2023, pp. 301-305. online: https://www.isca-speech.org/archive/interspeech_2023/xie23b_interspeech.html
[4] Y. Li, W. Cao, J. Li, W. Xie, Q. He, Few-shot class-incremental audio classification using stochastic classifier, INTERSPEECH, 2023, pp. 4174-4178. online: https://www.isca-speech.org/archive/interspeech_2023/li23w_interspeech.html
[5] Y. Li, J. Li, Y. Si, J. Tan and Q. He, "Few-Shot Class-Incremental Audio Classification With Adaptive Mitigation of Forgetting and Overfitting," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 32, pp. 2297-2311, 2024, doi: 10.1109/TASLP.2024.3385287.
The following ASCII diagram depicts the directory structure:
<pre>
dataset_root
├── nsynth-100-fs-meta
│ ├── nsynth-100-fs_train.csv # containing information of all training samples from the base and novel classes
│ ├── nsynth-100-fs_val.csv # containing information of all validation samples from the base classes
│ ├── nsynth-100-fs_test.csv # containing information of all test samples from the old and novel classes
│ └── nsynth-100-fs_vocab.json # label vocabulary of the dataset
│
├── nsynth-200-fs-meta
│ ├── nsynth-200-fs_train.csv #
│ ├── nsynth-200-fs_val.csv
│ ├── nsynth-200-fs_test.csv
│ └── nsynth-200-fs_vocab.json
│
├── nsynth-300-fs-meta
│ ├── nsynth-300-fs_train.csv #
│ ├── nsynth-300-fs_val.csv
│ ├── nsynth-300-fs_test.csv
│ └── nsynth-300-fs_vocab.json
│
├── nsynth-400-fs-meta
│ ├── nsynth-400-fs_train.csv #
│ ├── nsynth-400-fs_val.csv
│ ├── nsynth-400-fs_test.csv
│ └── nsynth-400-fs_vocab.json
│
├── nsynth-train # Original training set for the Nsynth dataset
│ ├── audio
│ | ├── bass_acoustic_000-024-025.wav
│ | └── ....
│ └── examples.json # The original meta information file
│
├── nsynth-val # The original validation set for the Nsynth dataset
│ ├── audio
│ | ├── bass_electronic_018-022-025.wav
│ | └── ....
│ └── examples.json
│
└── nsynth-test # The original test set for the Nsynth dataset
├── audio
| ├── bass_electronic_018-022-100.wav
| └── ....
└── examples.json
</pre>
Each of the above csv files is uniformly formatted as follows:
``
filename instrument instrument_family instrument_source audio_source
0 guitar_electronic_017-088-075 guitar_electronic_017 guitar electronic nsynth-train
1 guitar_electronic_017-088-127 guitar_electronic_017 guitar electronic nsynth-train
```
The filename with .wav extension is the complete sample name, the instrument is the label of the corresponding sample, instrument_family and instrument_source are not needed. \
audio_source represents the folder where the sample is stored, so you can read the audio sample according to the following path format:
``
meta_info = pd.read_csv(...)
path = os.path.join('/data/datasets/The_NSynth_Dataset', meta_info[audio_source][i], meta_info[filename][i] + '.wav')
```
It is important to note that all the above reorganized meta files are randomly filtered, and in general, the file information will be different every time you run the filtering program. \
Therefore, to be on the safe side, please copy the meta folder of the above 4 settings to your own project directory, and the audio data can be copied without copying, because it will be the same when you re-download it.
Below we provide a rudimentary script to show how to read the data according to the different settings:\
First copy the script file /data/datasets/The_NSynth_Dataset/load_nsynth_data.py to your own project directory, then run
``
python load_nsynth_data.py --metapath /data/datasets/The_NSynth_Dataset --audiopath /data/datasets/The_NSynth_Dataset --num_class 100 -- base_class 55
If you want to download this dataset, first download all the metadata sections. For the data file part, the NSynth_fscil dataset is compressed in a split-volume package using the following command:
```
tar cvzf - DATASET_PATH | split -b 3000m -d - DATASET_NAME.tar.gz
```
The steps to unzip a split volume are as follows:
```
cat DATASET_NAME.tar.gz* > DATASET_NAME.tar.gz
tar xvzf DATASET_NAME.tar.gz
```
---metapath can be changed to the path where you save nsynth-100-fs-meta/nsynth-200-fs-meta ... The path to the 4 folders
--audiopath is the path where you keep the original nsynth dataset, this path is public, usually you don't need to change it.
For different set of dataset, just change the variable --num_class 100/200/300/400, the program will automatically read the corresponding meta file and load the corresponding data. \
At the same time, you need to change the variable --base_class 55/155/255/355 to correspond to the total number of categories of the base class in different cases.
By default, the program reads the audio samples online and converts them to fbank features directly. If you need to use other time-frequency features, you just need to modify them in wave_to_tfr function.
At the same time, you can also add feature normalization, spectral enhancement and other operations in this function.
# Enjoy the data and code!
### HTTP 克隆方式
bash
git clone https://www.modelscope.cn/datasets/pp199124903/NSynth-100.git
% 2022.05.25 由 Chester.W.Xie 制作 - 华南理工大学ASVP实验室(ASVP@SCUT)
- [NSynth 数据集(The NSynth Dataset)](https://magenta.tensorflow.org/datasets/nsynth) 是一款用于乐器识别的数据集。
- 该数据集共包含305979个样本(即乐器音符),每个样本均拥有独一无二的音高、音色与包络特征。所有样本均源自1006种乐器,样本时长统一为4秒,采样率统一为16kHz。
➢ 该数据集已被划分为三个子集:
- 训练集包含953个类别,总计289205个样本。单类别样本量最大值为440,最小值为30,平均值为303;
- 验证集包含53个类别,总计12678个样本。单类别样本量最大值为348,最小值为83,平均值为239;
- 测试集包含53个类别,总计4096个样本。单类别样本量最大值为125,最小值为22,平均值为125;
- 验证集与测试集的类别完全一致,而训练集的类别与验证集/测试集无交集。
➢ 验证集与测试集合并后的样本量统计:
- 样本量大于100的类别数:53
- 样本量大于150的类别数:52
- 样本量大于200的类别数:45
- 样本量大于250的类别数:32
- 样本量大于300的类别数:31
针对训练集样本量的统计:
- 样本量大于450的类别数:0
- 样本量大于440的类别数:377
- 样本量大于400的类别数:382
➢ 基于上述统计结果,我们可采用以下四种方式对数据集进行重构:
#### 配置1(NSynth-100-FS)
将测试集与验证集合并后,保留其中样本量大于200的45个类别作为新类别,并对每个新类别统一保留200个样本;随后将每个类别的200个样本平均划分为训练集与测试集。从训练集的382个类别中选取55个作为基类,每个基类保留400个样本,并按照2:1:1的比例随机划分为训练集、验证集与测试集。
#### 配置2(NSynth-200-FS)
新类别的训练集与测试集划分规则与配置1保持一致。从训练集的382个类别中选取155个作为基类,每个基类保留400个样本,并按照2:1:1的比例随机划分为训练集、验证集与测试集。
#### 配置3(NSynth-300-FS)
新类别的训练集与测试集划分规则与配置1保持一致。从训练集的382个类别中选取255个作为基类,每个基类保留400个样本,并按照2:1:1的比例随机划分为训练集、验证集与测试集。
#### 配置4(NSynth-400-FS)
新类别的训练集与测试集划分规则与配置1保持一致。从训练集的382个类别中选取355个作为基类,每个基类保留400个样本,并按照2:1:1的比例随机划分为训练集、验证集与测试集。
我们的重构策略如下:
- 保留原始音频样本的目录结构不变;
- 对原始元数据文件进行对应筛选,得到与四种配置匹配的元数据文件;
- 根据新生成的元数据文件信息,从原始样本目录中读取对应的数据。
若在研究工作中使用该数据集,请引用以下文献:
[1] Y. Li, W. Cao, W. Xie, J. Li 与 E. Benetos,《基于自注意力修正原型与动态扩展分类器的少样本类增量音频分类》,发表于 *IEEE Transactions on Multimedia*,2024年,第26卷,第1346-1360页,DOI: 10.1109/TMM.2023.3280011。
[2] W. Xie, Y. Li, Q. He, W. Cao,《基于判别式原型学习的少样本类增量音频分类》,*Expert Systems With Applications*,2023年,第225卷,120044,第1-13页。
[3] W. Xie, Y. Li, Q. He, W. Cao, T. Virtanen,《基于自适应精化原型的少样本类增量音频分类》,INTERSPEECH 2023,第301-305页,在线链接:https://www.isca-speech.org/archive/interspeech_2023/xie23b_interspeech.html
[4] Y. Li, W. Cao, J. Li, W. Xie, Q. He,《基于随机分类器的少样本类增量音频分类》,INTERSPEECH 2023,第4174-4178页,在线链接:https://www.isca-speech.org/archive/interspeech_2023/li23w_interspeech.html
[5] Y. Li, J. Li, Y. Si, J. Tan 与 Q. He,《自适应缓解遗忘与过拟合的少样本类增量音频分类》,发表于 *IEEE/ACM Transactions on Audio, Speech, and Language Processing*,2024年,第32卷,第2297-2311页,DOI: 10.1109/TASLP.2024.3385287。
以下ASCII图展示了该数据集的目录结构:
<pre>
dataset_root
├── nsynth-100-fs-meta
│ ├── nsynth-100-fs_train.csv # 包含基类与新类别的所有训练样本信息
│ ├── nsynth-100-fs_val.csv # 包含基类的所有验证样本信息
│ ├── nsynth-100-fs_test.csv # 包含基类与新类别的所有测试样本信息
│ └── nsynth-100-fs_vocab.json # 数据集的标签词汇表
│
├── nsynth-200-fs-meta
│ ├── nsynth-200-fs_train.csv
│ ├── nsynth-200-fs_val.csv
│ ├── nsynth-200-fs_test.csv
│ └── nsynth-200-fs_vocab.json
│
├── nsynth-300-fs-meta
│ ├── nsynth-300-fs_train.csv
│ ├── nsynth-300-fs_val.csv
│ ├── nsynth-300-fs_test.csv
│ └── nsynth-300-fs_vocab.json
│
├── nsynth-400-fs-meta
│ ├── nsynth-400-fs_train.csv
│ ├── nsynth-400-fs_val.csv
│ ├── nsynth-400-fs_test.csv
│ └── nsynth-400-fs_vocab.json
│
├── nsynth-train # NSynth数据集的原始训练集
│ ├── audio
│ | ├── bass_acoustic_000-024-025.wav
│ | └── ....
│ └── examples.json # 原始元信息文件
│
├── nsynth-val # NSynth数据集的原始验证集
│ ├── audio
│ | ├── bass_electronic_018-022-025.wav
│ | └── ....
│ └── examples.json
│
└── nsynth-test # NSynth数据集的原始测试集
├── audio
| ├── bass_electronic_018-022-100.wav
| └── ....
└── examples.json
</pre>
上述所有CSV(Comma-Separated Values,逗号分隔值)文件均采用统一格式,如下所示:
filename instrument instrument_family instrument_source audio_source
0 guitar_electronic_017-088-075 guitar_electronic_017 guitar electronic nsynth-train
1 guitar_electronic_017-088-127 guitar_electronic_017 guitar electronic nsynth-train
文件名(带.wav扩展名)即为完整的样本名称,`"instrument"`字段为对应样本的标签,`"instrument_family"`与`"instrument_source"`字段无需使用。`"audio_source"`字段代表样本所在的文件夹,因此可按照以下路径格式读取音频样本:
python
meta_info = pd.read_csv(...)
path = os.path.join('/data/datasets/The_NSynth_Dataset', meta_info[audio_source][i], meta_info[filename][i] + '.wav')
需要注意的是,所有上述重构生成的元数据文件均为随机筛选得到,通常每次运行筛选程序都会生成不同的文件信息。因此为稳妥起见,请将上述四种配置对应的元数据文件夹复制到您的项目目录中,而音频数据无需复制,因为重新下载后即可获得完全一致的音频文件。
下文我们提供了一个简易脚本,用于展示如何根据不同配置读取数据:
首先将脚本文件`/data/datasets/The_NSynth_Dataset/load_nsynth_data.py`复制到您的项目目录中,随后运行:
bash
python load_nsynth_data.py --metapath /data/datasets/The_NSynth_Dataset --audiopath /data/datasets/The_NSynth_Dataset --num_class 100 --base_class 55
若需下载该数据集,请先下载所有元数据部分。对于数据文件部分,NSynth_fscil数据集采用以下命令生成分卷压缩包:
bash
tar cvzf - DATASET_PATH | split -b 3000m -d - DATASET_NAME.tar.gz
分卷压缩包的解压步骤如下:
bash
cat DATASET_NAME.tar.gz* > DATASET_NAME.tar.gz
tar xvzf DATASET_NAME.tar.gz
--- `"--metapath"` 可修改为您存放nsynth-100-fs-meta、nsynth-200-fs-meta等四个元数据文件夹的路径;`"--audiopath"` 为原始NSynth数据集的存储路径,该路径为公共路径,通常无需修改。
针对不同的数据集配置,只需修改`"--num_class"`参数为100/200/300/400,程序将自动读取对应配置的元数据文件并加载相应数据。同时,您需要将`"--base_class"`参数修改为对应配置下的基类别总数,分别为55/155/255/355。
默认情况下,程序会在线读取音频样本并直接将其转换为FBANK(Filter Bank,滤波器组)特征。若需使用其他时频特征,只需在`"wave_to_tfr"`函数中进行修改即可。您还可以在该函数中添加特征归一化、频谱增强等操作。
# 祝您使用数据与代码愉快!
提供机构:
maas
创建时间:
2023-12-30



