FMC_fscil
收藏魔搭社区2025-12-29 更新2024-05-15 收录
下载链接:
https://modelscope.cn/datasets/pp199124903/FSC-89
下载链接
链接失效反馈官方服务:
资源简介:
### Clone with HTTP
```bash
git clone https://www.modelscope.cn/datasets/pp199124903/FSC-89.git
```
% - 2022.05.25 by Chester.W.Xie - ASVP@SCUT 👍👍👍👍🤙🤙🤙🤙🤙
- [FSD-MIX-CLIPS](https://zenodo.org/record/5574135#.YWyINEbMIWo) is a synthesized audio dataset open-sourced by Yu wang, mainly used for the research of few-shot.
- See the original paper for a detailed description of the original dataset.
- In order to use this data for the corresponding study of FSCIL, we have reorganized FSD-MIX-CLIPS, and for the sake of memory, we call the reorganized dataset:
- Free sound cilps 89, fsc-89
### Local synthesis of FSD-MIX-CLIPS
We have omitted the original FSD-MIX-CLIPS annotations for the time being, but we have eliminated 1485 duplicated samples, which is also recognized by Yu Wang (https://github.com/wangyu/rethink-audio-fsl/pull/19).
### Sample statistics of the dataset
- The FSD-MIX-CLIPS dataset obtained from the generation has the following sample size distribution:
| Base-train | Base-val | Base-test | Novel-val | Novel-test |
|:--------------------:|:------------------:|:-------------------:|:-------------------:|:--------------------:|
| 448,123 | 65,520 | 65,422 | 17,347 | 16,636 |
- Based on the information in the original meta file, the number of sample labels is counted and the statistics obtained are as follows:
| | Base-train | Base-val | Base-test | Novel-val | Novel-test |
|:--------------------:|:--------------------:|:------------------:|:-------------------:|:-------------------:|:----------------- ---:||
| singel label | 351,781 | 51,889 | 50,550 | 13,358 | 12,605 |
| multi label | 96,342 | 13,631 | 14,872 | 3,989 | 4,031 | total | 448,000 | 4,031
| total | 448,123 | 65,520 | 65,422 | 17,347 | 16,636 |
- Further statistics on the number of samples within a single labeled class are given below:
| | Base-train | Base-val | Base-test | Novel-val | Novel-test |
|:------------------------------:|:--------------------:|:------------------:|:-------------------:|:-------------------:|:------- -------------:|
| Ave num. per class | 5,962 | 879 | 856 | 890 | 840 |
| [min, max] num. per class | [5774, 6160] | [810, 931] | [801, 908] | [834, 937] | [791, 871] |
- The next step is to reorganize the data according to the FSCIL tasks, and in this paper we consider 2 reorganization scenarios:
### Setup 1 (large scale, try to keep all samples from the original dataset).
- classes 0~58 as Base classes, 59~88 as Novel classes
- for each Base class, 5000 samples for train (Sampling from Base-train), \\ 800 samples for validation (Sampling from Base-train).
800 samples for validation (Sampling from Base-val), \\
200 samples for test (Sampling from Base-test).
- Combine Novel-val and Novel-test into one Novel set
- for each Novel class, 500 samples for train, 200 samples for test (Sampling from Base-val, \ 200 samples for test (Sampling from Base-test). combine Novel-val and Novel-test into one Novel set.
- The number of training samples for the old class and the new class is: 59 * 5000 + 30 * 500 = 310,000
- The number of validation samples for the old class is: 59 * 800 = 47,200
- The total number of test samples for the new class is 200 * 30 = 6000, and the total number of test samples for the base class is 200 * 59 = 11,800, so the total number of samples for the old and new classes is 11,800 + 6000 = 17,800.
### Setup 2 (small scale, to save program debugging time).
- classes 0~58 as Base classes, 59~88 as Novel classes
- for each Base class, 800 samples for train (Sampling from Base-train), \
200 samples for validation (Sampling from Base-val), \\
200 samples for test (Sampling from Base-test).
- Combine Novel-val and Novel-test into one Novel set
- for each Novel class, 500 samples for train, 200 samples for test (Sampling from Base-val, \ 200 samples for test (Sampling from Base-test). combine Novel-val and Novel-test into one Novel set.
- The number of training samples for the old class and the new class is: 59 * 800 + 30 * 500 = 62,200
- The number of validation samples for the old class is: 59 * 200 = 11,800
- The number of test samples accumulated for the new class is 200 * 30 = 6000, and the number of test samples accumulated for the base class is 200 * 59 = 11,800, thus, the total number of samples for the old and new classes is 11,800 + 6000 = 17,800
### Our reorganization strategy is:
- Keep the original data structure and synthesize the complete FSD-MIX-CLIPS dataset, including the openl3 version and the audio sample version;
- Filter the original meta-file information accordingly and generate new meta-files;
- Use the newly generated meta files to read the corresponding data.
Please cite the following papers when you use the datasets in your work.
[1] Y. Li, W. Cao, W. Xie, J. Li and E. Benetos, "Few-Shot Class-Incremental Audio Classification Using Dynamically Expanded Classifier With Self-Attention Modified Prototypes," in IEEE Transactions on Multimedia, vol. 26, pp. 1346-1360, 2024, doi: 10.1109/TMM.2023.3280011.
[2] W. Xie, Y. Li, Q. He, W.g Cao, Few-shot class-incremental audio classification via discriminative prototype learning, Expert Systems With Applications, 2023, vol. 225, 120044, pp. 1-13.
[3] W. Xie, Y. Li, Q. He, W. Cao, T. Virtanen, Few-shot class-incremental audio classification using adaptively-refined prototypes, INTERSPEECH, 2023, pp. 301-305. online: https://www.isca-speech.org/archive/interspeech_2023/xie23b_interspeech.html
[4] Y. Li, W. Cao, J. Li, W. Xie, Q. He, Few-shot class-incremental audio classification using stochastic classifier, INTERSPEECH, 2023, pp. 4174-4178. online: https://www.isca-speech.org/archive/interspeech_2023/li23w_interspeech.html
[5] Y. Li, J. Li, Y. Si, J. Tan and Q. He, "Few-Shot Class-Incremental Audio Classification With Adaptive Mitigation of Forgetting and Overfitting," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 32, pp. 2297-2311, 2024, doi: 10.1109/TASLP.2024.3385287.
The full data after synthesis and reorganization is stored in:
``
cd /data/datasets/FSD-MIX-CLIPS-for_FSCIL
```
<pre>
dataset_root
├── vovab.json # - stores the tag name of class 89
│
├── FSD_MIX_SED.annotations # - Annotation information for the original SED dataset
│
├── FSD_MIX_SED.source # - Synthesized raw material of the original SED dataset
│
├── FSD_MIX_SED.audio # - Locally synthesized SED dataset
│
├── FSD_MIX_CLIPS.annotations # - Raw cilps annotation information │ ├── FSD_MIX_CLIPS.annotations
│
├── FSD_MIX_CLIPS_data # - Synthesized cilps dataset with corrected annotations │ ├── openlib.audio # - Local synthesized SED dataset
│ ├── openl3 # - openl3 features of the dataset
│ ├── base
│ | ├─ train
│ | | ├─ soundscape_205038_327222_1642.pkl
│ | ├── val
│ | └── test
│ │ | └─ test
│ │ ├── val
│ │ ├── test
│ | └── full_filelist # - Read path of the completed dataset and the labeled dictionary file.
│ └── audio # - Audio samples from the dataset
│ ├── base
│ │ ├── train
│ | ├── soundscape_205038_327222_1642.wav
│ ├── val
│ └── test
│ ├── val
│ ├── test
│ └── full_filelist # - Completed dataset read path and label Dictionary File
│
│
└── FSD_MIX_CLIPS.annotations_revised # - Corrected cilps annotation information, i.e., 1485 duplicates removed
├── base_train.csv
├── base_val.csv
├── base_test.csv
├── novel_val.csv
├── novel_test.csv # - We are using these 6 corrected files to synthesize the cilps dataset
| novel_val.csv
├─ single_label_meta # - the 6 csv files with the multi-label sample information removed
|
└── FSC-89-meta # - sampling the number of samples based on single_label_meta and merging multiple files according to FSCIL settings
├── huge # - meta file of the reorganized dataset to be used in the experiment, corresponding to setup 1
| ├── Fsc89-huge-fsci_train.csv # - contains the training samples of the old and new classes, the difference between the old and new classes is the number of samples.
| ├── Fsc89-huge-fsci_val.csv # - contains base class validation sample information, only the base class has validation data
| └── Fsc89-huge-fsci_test.csv # - contains test sample information for old and new classes, sample size is balanced
|
└── mini # - meta file of the reorganized dataset to be used for the experiment, corresponds to setup 2
├── Fsc89-mini-fsci_train.csv # - contains the training samples of the old and new classes, the difference between the old and new classes is the number of samples.
├── Fsc89-mini-fsci_val.csv # - contains base class validation sample information, only the base class has validation data
└── Fsc89-mini-fsci_test.csv # - contains the test sample information of the old and new classes, the sample number is balanced
</pre>
- The format of each reorganization csv file is standardized as follows:
``
data_folder FSD_MIX_SED_filename start_time label
0 base/train soundscape_195781.wav 7.75 0
1 base/train soundscape_30977.wav 1.06 0
2 base/train soundscape_10404.wav 0.09 0
```
- data_folder is the subfolder where the samples are located.
The samples can be read in the following format:
```
meta_info = pd.read_csv(...)
data_type = 'audo' # or 'openl3'
data_dir = '/data/datasets/FSD-MIX-CLIPS-for_FSCIL/FSD-MIX-CLIPS_data'
start_sample = int(meta_info[start_time][i] * 44100)
filename = meta_info[FSD_MIX_SED_filename][i].replace('.wav', '_' + str(start_sample) + '.wav')
path = os.path.join(data_dir, data_type, filename)
```
It is important to note that all of the above reorganization of the meta files (a total of 6 files) is randomly filtered, in general, each time you run the screening program to get the file information will be different. \
Therefore, to be on the safe side, please copy the FSC-89-meta folder to your own project directory, and the audio data can be left out because the re-synthesis will be the same.
If you want to download this dataset, first download all the metadata sections. For the data file part, the FMC_fscil dataset is compressed in a split-volume package using the following command:
```
tar cvzf - DATASET_PATH | split -b 3000m -d - DATASET_NAME.tar.gz
```
The steps to unzip a split volume are as follows:
```
cat DATASET_NAME.tar.gz* > DATASET_NAME.tar.gz
tar xvzf DATASET_NAME.tar.gz
```
Below we provide a rudimentary script to show how to read the data according to different settings:\
First copy the script file /data/datasets/FSD-MIX-CLIPS-for_FSCIL/load_fsc_89_data.py to your own project directory and run.
``
python load_fsc_89_data.py --metapath /data/datasets/FSD-MIX-CLIPS-for_FSCIL/FSD_MIX_CLIPS.annotations_revised/FSC-89-meta --datapath / data/datasets/FSD-MIX-CLIPS-for_FSCIL/FSD-MIX-CLIPS_data --data_type audio ---setup mini
```
- The default is to load a small dataset to save training time when debugging, if you need to use large scale, just set --setup huge.
- 运行以下脚本可以检查meta信息和数据样本是否都存在
```
python check_meta_data.py
```
### 通过HTTP克隆
bash
git clone https://www.modelscope.cn/datasets/pp199124903/FSC-89.git
% 2022.05.25 由 Chester.W.Xie 于华南理工大学ASVP实验室 发布
- [FSD-MIX-CLIPS](https://zenodo.org/record/5574135#.YWyINEbMIWo) 是由Yu Wang开源的合成音频数据集,主要用于少样本(Few-shot)相关研究。
- 原始数据集的详细描述请参阅其发表论文。
- 为将该数据集应用于小样本类别增量学习(Few-Shot Class-Incremental Learning, FSCIL)的相关研究,我们对FSD-MIX-CLIPS进行了重新整理,并将其命名为**Free Sound Clips 89(简称FSC-89)**,以方便记忆。
### FSD-MIX-CLIPS的本地合成
我们暂时省略了原始FSD-MIX-CLIPS的标注信息,但已剔除了1485个重复样本,该处理也得到了Yu Wang的认可(https://github.com/wangyu/rethink-audio-fsl/pull/19)。
### 数据集样本统计
- 从原始生成流程中获取的FSD-MIX-CLIPS数据集样本分布如下:
| 基础训练集(Base-train) | 基础验证集(Base-val) | 基础测试集(Base-test) | 新类验证集(Novel-val) | 新类测试集(Novel-test) |
|:--------------------:|:------------------:|:-------------------:|:-------------------:|:--------------------:|
| 448,123 | 65,520 | 65,422 | 17,347 | 16,636 |
- 基于原始元文件的信息统计样本标签数量,结果如下:
| | 基础训练集 | 基础验证集 | 基础测试集 | 新类验证集 | 新类测试集 |
|:--------------------:|:--------------------:|:------------------:|:-------------------:|:-------------------:|:-------------------:|
| 单标签样本 | 351,781 | 51,889 | 50,550 | 13,358 | 12,605 |
| 多标签样本 | 96,342 | 13,631 | 14,872 | 3,989 | 4,031 |
| 总计 | 448,123 | 65,520 | 65,422 | 17,347 | 16,636 |
- 以下为单标签类别下的样本数量统计:
| | 基础训练集 | 基础验证集 | 基础测试集 | 新类验证集 | 新类测试集 |
|:------------------------------:|:--------------------:|:------------------:|:-------------------:|:-------------------:|:-------------------:|
| 单类平均样本数 | 5,962 | 879 | 856 | 890 | 840 |
| 单类样本数区间[最小值, 最大值] | [5774, 6160] | [810, 931] | [801, 908] | [834, 937] | [791, 871] |
- 下一步将按照小样本类别增量学习(FSCIL)任务要求重组数据,本研究共考虑两种重组场景:
### 方案1(大规模场景,尽可能保留原始数据集全部样本)
- 将类别0~58设为基础类,类别59~88设为新类
- 针对每个基础类,从基础训练集中采样5000个样本用于训练,从基础验证集中采样800个样本用于验证,从基础测试集中采样200个样本用于测试。
- 将新类验证集与新类测试集合并为一个新类集合;针对每个新类,从合并后的新类集合中采样500个样本用于训练,200个样本用于测试。
- 旧类与新类的训练样本总数为:59 * 5000 + 30 * 500 = 310,000
- 旧类的验证样本总数为:59 * 800 = 47,200
- 新类测试样本总数为200 * 30 = 6,000,基础类测试样本总数为200 * 59 = 11,800,因此新旧类测试样本总数量为11,800 + 6,000 = 17,800。
### 方案2(小批量场景,用于节省程序调试时间)
- 将类别0~58设为基础类,类别59~88设为新类
- 针对每个基础类,从基础训练集中采样800个样本用于训练,从基础验证集中采样200个样本用于验证,从基础测试集中采样200个样本用于测试。
- 将新类验证集与新类测试集合并为一个新类集合;针对每个新类,从合并后的新类集合中采样500个样本用于训练,200个样本用于测试。
- 旧类与新类的训练样本总数为:59 * 800 + 30 * 500 = 62,200
- 旧类的验证样本总数为:59 * 200 = 11,800
- 新类测试样本总数为200 * 30 = 6,000,基础类测试样本总数为200 * 59 = 11,800,因此新旧类测试样本总数量为11,800 + 6,000 = 17,800
### 我们的重组策略如下:
- 保留原始数据结构,合成完整的FSD-MIX-CLIPS数据集,包含openl3特征版本与音频样本版本;
- 对原始元文件信息进行相应筛选,并生成新的元文件;
- 使用新生成的元文件读取对应数据。
若在研究工作中使用本数据集,请引用以下论文:
[1] Y. Li, W. Cao, W. Xie, J. Li and E. Benetos, "Few-Shot Class-Incremental Audio Classification Using Dynamically Expanded Classifier With Self-Attention Modified Prototypes," in IEEE Transactions on Multimedia, vol. 26, pp. 1346-1360, 2024, doi: 10.1109/TMM.2023.3280011.
[2] W. Xie, Y. Li, Q. He, W. Cao, Few-shot class-incremental audio classification via discriminative prototype learning, Expert Systems With Applications, 2023, vol. 225, 120044, pp. 1-13.
[3] W. Xie, Y. Li, Q. He, W. Cao, T. Virtanen, Few-shot class-incremental audio classification using adaptively-refined prototypes, INTERSPEECH, 2023, pp. 301-305. 在线链接:https://www.isca-speech.org/archive/interspeech_2023/xie23b_interspeech.html
[4] Y. Li, W. Cao, J. Li, W. Xie, Q. He, Few-shot class-incremental audio classification using stochastic classifier, INTERSPEECH, 2023, pp. 4174-4178. 在线链接:https://www.isca-speech.org/archive/interspeech_2023/li23w_interspeech.html
[5] Y. Li, J. Li, Y. Si, J. Tan and Q. He, "Few-Shot Class-Incremental Audio Classification With Adaptive Mitigation of Forgetting and Overfitting," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 32, pp. 2297-2311, 2024, doi: 10.1109/TASLP.2024.3385287.
合成并重组后的完整数据集存储路径如下:
cd /data/datasets/FSD-MIX-CLIPS-for_FSCIL
<pre>
dataset_root
├── vocab.json # 存储89个类别的标签名称
│
├── FSD_MIX_SED.annotations # 原始声音事件检测(SED)数据集的标注信息
│
├── FSD_MIX_SED.source # 原始SED数据集的合成原始素材
│
├── FSD_MIX_SED.audio # 本地合成的SED数据集音频文件
│
├── FSD_MIX_CLIPS.annotations # 原始片段标注信息
│
├── FSD_MIX_CLIPS_data # 经标注修正后的合成片段数据集
│ ├── openl3 # 数据集的openl3特征文件
│ │ ├── base
│ │ │ ├── train
│ │ │ │ └── soundscape_205038_327222_1642.pkl
│ │ │ ├── val
│ │ │ └── test
│ │ ├── val
│ │ ├── test
│ │ └── full_filelist # 完整数据集读取路径与标签字典文件
│ └── audio # 数据集音频样本文件
│ ├── base
│ │ ├── train
│ │ │ └── soundscape_205038_327222_1642.wav
│ │ ├── val
│ │ └── test
│ ├── val
│ ├── test
│ └── full_filelist # 完整数据集读取路径与标签字典文件
│
└── FSD_MIX_CLIPS.annotations_revised # 经修正的片段标注信息(已剔除1485个重复样本)
├── base_train.csv
├── base_val.csv
├── base_test.csv
├── novel_val.csv
├── novel_test.csv # 我们使用上述6个修正后的文件来合成FSC-89数据集
├── single_label_meta # 移除多标签样本信息后的6个CSV文件
└── FSC-89-meta # 基于single_label_meta采样样本数量,并按照FSCIL任务设置合并多个文件得到的元数据
├── huge # 对应方案1的实验用重组数据集元文件
│ ├── Fsc89-huge-fsci_train.csv # 包含新旧类别的训练样本,区分新旧类别的样本数量
│ ├── Fsc89-huge-fsci_val.csv # 仅包含基础类别的验证样本信息
│ └── Fsc89-huge-fsci_test.csv # 包含新旧类别的测试样本信息,样本数量均衡
└── mini # 对应方案2的实验用重组数据集元文件
├── Fsc89-mini-fsci_train.csv # 包含新旧类别的训练样本,区分新旧类别的样本数量
├── Fsc89-mini-fsci_val.csv # 仅包含基础类别的验证样本信息
└── Fsc89-mini-fsci_test.csv # 包含新旧类别的测试样本信息,样本数量均衡
</pre>
### 重组后的CSV元文件格式规范如下:
data_folder FSD_MIX_SED_filename start_time label
0 base/train soundscape_195781.wav 7.75 0
1 base/train soundscape_30977.wav 1.06 0
2 base/train soundscape_10404.wav 0.09 0
- 其中`data_folder`为样本所在的子文件夹路径。
样本读取示例代码如下:
python
meta_info = pd.read_csv(...)
data_type = 'audio' # 或 'openl3'
data_dir = '/data/datasets/FSD-MIX-CLIPS-for_FSCIL/FSD-MIX_CLIPS_data'
start_sample = int(meta_info['start_time'][i] * 44100)
filename = meta_info['FSD_MIX_SED_filename'][i].replace('.wav', '_' + str(start_sample) + '.wav')
path = os.path.join(data_dir, data_type, filename)
需要注意的是,上述所有6个重组元文件均为随机筛选得到,每次运行筛选程序得到的文件信息可能存在差异。因此,为稳妥起见,请将`FSC-89-meta`文件夹复制到您自己的项目目录中;音频数据可无需复制,因为可通过重新合成得到。
如果您需要下载本数据集,请首先下载所有元数据部分。对于数据文件部分,本数据集采用以下命令进行分卷压缩:
tar cvzf - DATASET_PATH | split -b 3000m -d - DATASET_NAME.tar.gz
分卷压缩包的解压步骤如下:
cat DATASET_NAME.tar.gz* > DATASET_NAME.tar.gz
tar xvzf DATASET_NAME.tar.gz
### 数据读取示例脚本
我们提供了一个基础脚本,用于展示如何按照不同设置读取数据:
首先将脚本文件`/data/datasets/FSD-MIX-CLIPS-for_FSCIL/load_fsc_89_data.py`复制到您的项目目录中并运行:
python load_fsc_89_data.py --metapath /data/datasets/FSD-MIX-CLIPS-for_FSCIL/FSD_MIX_CLIPS.annotations_revised/FSC-89-meta --datapath /data/datasets/FSD-MIX-CLIPS-for_FSCIL/FSD-MIX_CLIPS_data --data_type audio --setup mini
- 默认情况下,脚本将加载小批量数据集以节省训练时间,若需使用大规模数据集,只需将`--setup`参数设置为`huge`即可。
- 运行以下脚本可检查元数据与数据样本是否完整:
python check_meta_data.py
提供机构:
maas
创建时间:
2023-12-30



