five

FMC_fscil

收藏
魔搭社区2025-12-29 更新2024-05-15 收录
下载链接:
https://modelscope.cn/datasets/pp199124903/FSC-89
下载链接
链接失效反馈
官方服务:
资源简介:
### Clone with HTTP ```bash git clone https://www.modelscope.cn/datasets/pp199124903/FSC-89.git ``` % - 2022.05.25 by Chester.W.Xie - ASVP@SCUT 👍👍👍👍🤙🤙🤙🤙🤙 - [FSD-MIX-CLIPS](https://zenodo.org/record/5574135#.YWyINEbMIWo) is a synthesized audio dataset open-sourced by Yu wang, mainly used for the research of few-shot. - See the original paper for a detailed description of the original dataset. - In order to use this data for the corresponding study of FSCIL, we have reorganized FSD-MIX-CLIPS, and for the sake of memory, we call the reorganized dataset: - Free sound cilps 89, fsc-89 ### Local synthesis of FSD-MIX-CLIPS We have omitted the original FSD-MIX-CLIPS annotations for the time being, but we have eliminated 1485 duplicated samples, which is also recognized by Yu Wang (https://github.com/wangyu/rethink-audio-fsl/pull/19). ### Sample statistics of the dataset - The FSD-MIX-CLIPS dataset obtained from the generation has the following sample size distribution: | Base-train | Base-val | Base-test | Novel-val | Novel-test | |:--------------------:|:------------------:|:-------------------:|:-------------------:|:--------------------:| | 448,123 | 65,520 | 65,422 | 17,347 | 16,636 | - Based on the information in the original meta file, the number of sample labels is counted and the statistics obtained are as follows: | | Base-train | Base-val | Base-test | Novel-val | Novel-test | |:--------------------:|:--------------------:|:------------------:|:-------------------:|:-------------------:|:----------------- ---:|| | singel label | 351,781 | 51,889 | 50,550 | 13,358 | 12,605 | | multi label | 96,342 | 13,631 | 14,872 | 3,989 | 4,031 | total | 448,000 | 4,031 | total | 448,123 | 65,520 | 65,422 | 17,347 | 16,636 | - Further statistics on the number of samples within a single labeled class are given below: | | Base-train | Base-val | Base-test | Novel-val | Novel-test | |:------------------------------:|:--------------------:|:------------------:|:-------------------:|:-------------------:|:------- -------------:| | Ave num. per class | 5,962 | 879 | 856 | 890 | 840 | | [min, max] num. per class | [5774, 6160] | [810, 931] | [801, 908] | [834, 937] | [791, 871] | - The next step is to reorganize the data according to the FSCIL tasks, and in this paper we consider 2 reorganization scenarios: ### Setup 1 (large scale, try to keep all samples from the original dataset). - classes 0~58 as Base classes, 59~88 as Novel classes - for each Base class, 5000 samples for train (Sampling from Base-train), \\ 800 samples for validation (Sampling from Base-train). 800 samples for validation (Sampling from Base-val), \\ 200 samples for test (Sampling from Base-test). - Combine Novel-val and Novel-test into one Novel set - for each Novel class, 500 samples for train, 200 samples for test (Sampling from Base-val, \ 200 samples for test (Sampling from Base-test). combine Novel-val and Novel-test into one Novel set. - The number of training samples for the old class and the new class is: 59 * 5000 + 30 * 500 = 310,000 - The number of validation samples for the old class is: 59 * 800 = 47,200 - The total number of test samples for the new class is 200 * 30 = 6000, and the total number of test samples for the base class is 200 * 59 = 11,800, so the total number of samples for the old and new classes is 11,800 + 6000 = 17,800. ### Setup 2 (small scale, to save program debugging time). - classes 0~58 as Base classes, 59~88 as Novel classes - for each Base class, 800 samples for train (Sampling from Base-train), \ 200 samples for validation (Sampling from Base-val), \\ 200 samples for test (Sampling from Base-test). - Combine Novel-val and Novel-test into one Novel set - for each Novel class, 500 samples for train, 200 samples for test (Sampling from Base-val, \ 200 samples for test (Sampling from Base-test). combine Novel-val and Novel-test into one Novel set. - The number of training samples for the old class and the new class is: 59 * 800 + 30 * 500 = 62,200 - The number of validation samples for the old class is: 59 * 200 = 11,800 - The number of test samples accumulated for the new class is 200 * 30 = 6000, and the number of test samples accumulated for the base class is 200 * 59 = 11,800, thus, the total number of samples for the old and new classes is 11,800 + 6000 = 17,800 ### Our reorganization strategy is: - Keep the original data structure and synthesize the complete FSD-MIX-CLIPS dataset, including the openl3 version and the audio sample version; - Filter the original meta-file information accordingly and generate new meta-files; - Use the newly generated meta files to read the corresponding data. Please cite the following papers when you use the datasets in your work. [1] Y. Li, W. Cao, W. Xie, J. Li and E. Benetos, "Few-Shot Class-Incremental Audio Classification Using Dynamically Expanded Classifier With Self-Attention Modified Prototypes," in IEEE Transactions on Multimedia, vol. 26, pp. 1346-1360, 2024, doi: 10.1109/TMM.2023.3280011. [2] W. Xie, Y. Li, Q. He, W.g Cao, Few-shot class-incremental audio classification via discriminative prototype learning, Expert Systems With Applications, 2023, vol. 225, 120044, pp. 1-13. [3] W. Xie, Y. Li, Q. He, W. Cao, T. Virtanen, Few-shot class-incremental audio classification using adaptively-refined prototypes, INTERSPEECH, 2023, pp. 301-305. online: https://www.isca-speech.org/archive/interspeech_2023/xie23b_interspeech.html  [4] Y. Li, W. Cao, J. Li, W. Xie, Q. He, Few-shot class-incremental audio classification using stochastic classifier, INTERSPEECH, 2023, pp. 4174-4178. online: https://www.isca-speech.org/archive/interspeech_2023/li23w_interspeech.html [5] Y. Li, J. Li, Y. Si, J. Tan and Q. He, "Few-Shot Class-Incremental Audio Classification With Adaptive Mitigation of Forgetting and Overfitting," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 32, pp. 2297-2311, 2024, doi: 10.1109/TASLP.2024.3385287. The full data after synthesis and reorganization is stored in: `` cd /data/datasets/FSD-MIX-CLIPS-for_FSCIL ``` <pre> dataset_root ├── vovab.json # - stores the tag name of class 89 │ ├── FSD_MIX_SED.annotations # - Annotation information for the original SED dataset │ ├── FSD_MIX_SED.source # - Synthesized raw material of the original SED dataset │ ├── FSD_MIX_SED.audio # - Locally synthesized SED dataset │ ├── FSD_MIX_CLIPS.annotations # - Raw cilps annotation information │ ├── FSD_MIX_CLIPS.annotations │ ├── FSD_MIX_CLIPS_data # - Synthesized cilps dataset with corrected annotations │ ├── openlib.audio # - Local synthesized SED dataset │ ├── openl3 # - openl3 features of the dataset │ ├── base │ | ├─ train │ | | ├─ soundscape_205038_327222_1642.pkl │ | ├── val │ | └── test │ │ | └─ test │ │ ├── val │ │ ├── test │ | └── full_filelist # - Read path of the completed dataset and the labeled dictionary file. │ └── audio # - Audio samples from the dataset │ ├── base │ │ ├── train │ | ├── soundscape_205038_327222_1642.wav │ ├── val │ └── test │ ├── val │ ├── test │ └── full_filelist # - Completed dataset read path and label Dictionary File │ │ └── FSD_MIX_CLIPS.annotations_revised # - Corrected cilps annotation information, i.e., 1485 duplicates removed ├── base_train.csv ├── base_val.csv ├── base_test.csv ├── novel_val.csv ├── novel_test.csv # - We are using these 6 corrected files to synthesize the cilps dataset | novel_val.csv ├─ single_label_meta # - the 6 csv files with the multi-label sample information removed | └── FSC-89-meta # - sampling the number of samples based on single_label_meta and merging multiple files according to FSCIL settings ├── huge # - meta file of the reorganized dataset to be used in the experiment, corresponding to setup 1 | ├── Fsc89-huge-fsci_train.csv # - contains the training samples of the old and new classes, the difference between the old and new classes is the number of samples. | ├── Fsc89-huge-fsci_val.csv # - contains base class validation sample information, only the base class has validation data | └── Fsc89-huge-fsci_test.csv # - contains test sample information for old and new classes, sample size is balanced | └── mini # - meta file of the reorganized dataset to be used for the experiment, corresponds to setup 2 ├── Fsc89-mini-fsci_train.csv # - contains the training samples of the old and new classes, the difference between the old and new classes is the number of samples. ├── Fsc89-mini-fsci_val.csv # - contains base class validation sample information, only the base class has validation data └── Fsc89-mini-fsci_test.csv # - contains the test sample information of the old and new classes, the sample number is balanced </pre> - The format of each reorganization csv file is standardized as follows: `` data_folder FSD_MIX_SED_filename start_time label 0 base/train soundscape_195781.wav 7.75 0 1 base/train soundscape_30977.wav 1.06 0 2 base/train soundscape_10404.wav 0.09 0 ``` - data_folder is the subfolder where the samples are located. The samples can be read in the following format: ``` meta_info = pd.read_csv(...) data_type = 'audo' # or 'openl3' data_dir = '/data/datasets/FSD-MIX-CLIPS-for_FSCIL/FSD-MIX-CLIPS_data' start_sample = int(meta_info[start_time][i] * 44100) filename = meta_info[FSD_MIX_SED_filename][i].replace('.wav', '_' + str(start_sample) + '.wav') path = os.path.join(data_dir, data_type, filename) ``` It is important to note that all of the above reorganization of the meta files (a total of 6 files) is randomly filtered, in general, each time you run the screening program to get the file information will be different. \ Therefore, to be on the safe side, please copy the FSC-89-meta folder to your own project directory, and the audio data can be left out because the re-synthesis will be the same. If you want to download this dataset, first download all the metadata sections. For the data file part, the FMC_fscil dataset is compressed in a split-volume package using the following command: ``` tar cvzf - DATASET_PATH | split -b 3000m -d - DATASET_NAME.tar.gz ``` The steps to unzip a split volume are as follows: ``` cat DATASET_NAME.tar.gz* > DATASET_NAME.tar.gz tar xvzf DATASET_NAME.tar.gz ``` Below we provide a rudimentary script to show how to read the data according to different settings:\ First copy the script file /data/datasets/FSD-MIX-CLIPS-for_FSCIL/load_fsc_89_data.py to your own project directory and run. `` python load_fsc_89_data.py --metapath /data/datasets/FSD-MIX-CLIPS-for_FSCIL/FSD_MIX_CLIPS.annotations_revised/FSC-89-meta --datapath / data/datasets/FSD-MIX-CLIPS-for_FSCIL/FSD-MIX-CLIPS_data --data_type audio ---setup mini ``` - The default is to load a small dataset to save training time when debugging, if you need to use large scale, just set --setup huge. - 运行以下脚本可以检查meta信息和数据样本是否都存在 ``` python check_meta_data.py ```

### 通过HTTP克隆 bash git clone https://www.modelscope.cn/datasets/pp199124903/FSC-89.git % 2022.05.25 由 Chester.W.Xie 于华南理工大学ASVP实验室 发布 - [FSD-MIX-CLIPS](https://zenodo.org/record/5574135#.YWyINEbMIWo) 是由Yu Wang开源的合成音频数据集,主要用于少样本(Few-shot)相关研究。 - 原始数据集的详细描述请参阅其发表论文。 - 为将该数据集应用于小样本类别增量学习(Few-Shot Class-Incremental Learning, FSCIL)的相关研究,我们对FSD-MIX-CLIPS进行了重新整理,并将其命名为**Free Sound Clips 89(简称FSC-89)**,以方便记忆。 ### FSD-MIX-CLIPS的本地合成 我们暂时省略了原始FSD-MIX-CLIPS的标注信息,但已剔除了1485个重复样本,该处理也得到了Yu Wang的认可(https://github.com/wangyu/rethink-audio-fsl/pull/19)。 ### 数据集样本统计 - 从原始生成流程中获取的FSD-MIX-CLIPS数据集样本分布如下: | 基础训练集(Base-train) | 基础验证集(Base-val) | 基础测试集(Base-test) | 新类验证集(Novel-val) | 新类测试集(Novel-test) | |:--------------------:|:------------------:|:-------------------:|:-------------------:|:--------------------:| | 448,123 | 65,520 | 65,422 | 17,347 | 16,636 | - 基于原始元文件的信息统计样本标签数量,结果如下: | | 基础训练集 | 基础验证集 | 基础测试集 | 新类验证集 | 新类测试集 | |:--------------------:|:--------------------:|:------------------:|:-------------------:|:-------------------:|:-------------------:| | 单标签样本 | 351,781 | 51,889 | 50,550 | 13,358 | 12,605 | | 多标签样本 | 96,342 | 13,631 | 14,872 | 3,989 | 4,031 | | 总计 | 448,123 | 65,520 | 65,422 | 17,347 | 16,636 | - 以下为单标签类别下的样本数量统计: | | 基础训练集 | 基础验证集 | 基础测试集 | 新类验证集 | 新类测试集 | |:------------------------------:|:--------------------:|:------------------:|:-------------------:|:-------------------:|:-------------------:| | 单类平均样本数 | 5,962 | 879 | 856 | 890 | 840 | | 单类样本数区间[最小值, 最大值] | [5774, 6160] | [810, 931] | [801, 908] | [834, 937] | [791, 871] | - 下一步将按照小样本类别增量学习(FSCIL)任务要求重组数据,本研究共考虑两种重组场景: ### 方案1(大规模场景,尽可能保留原始数据集全部样本) - 将类别0~58设为基础类,类别59~88设为新类 - 针对每个基础类,从基础训练集中采样5000个样本用于训练,从基础验证集中采样800个样本用于验证,从基础测试集中采样200个样本用于测试。 - 将新类验证集与新类测试集合并为一个新类集合;针对每个新类,从合并后的新类集合中采样500个样本用于训练,200个样本用于测试。 - 旧类与新类的训练样本总数为:59 * 5000 + 30 * 500 = 310,000 - 旧类的验证样本总数为:59 * 800 = 47,200 - 新类测试样本总数为200 * 30 = 6,000,基础类测试样本总数为200 * 59 = 11,800,因此新旧类测试样本总数量为11,800 + 6,000 = 17,800。 ### 方案2(小批量场景,用于节省程序调试时间) - 将类别0~58设为基础类,类别59~88设为新类 - 针对每个基础类,从基础训练集中采样800个样本用于训练,从基础验证集中采样200个样本用于验证,从基础测试集中采样200个样本用于测试。 - 将新类验证集与新类测试集合并为一个新类集合;针对每个新类,从合并后的新类集合中采样500个样本用于训练,200个样本用于测试。 - 旧类与新类的训练样本总数为:59 * 800 + 30 * 500 = 62,200 - 旧类的验证样本总数为:59 * 200 = 11,800 - 新类测试样本总数为200 * 30 = 6,000,基础类测试样本总数为200 * 59 = 11,800,因此新旧类测试样本总数量为11,800 + 6,000 = 17,800 ### 我们的重组策略如下: - 保留原始数据结构,合成完整的FSD-MIX-CLIPS数据集,包含openl3特征版本与音频样本版本; - 对原始元文件信息进行相应筛选,并生成新的元文件; - 使用新生成的元文件读取对应数据。 若在研究工作中使用本数据集,请引用以下论文: [1] Y. Li, W. Cao, W. Xie, J. Li and E. Benetos, "Few-Shot Class-Incremental Audio Classification Using Dynamically Expanded Classifier With Self-Attention Modified Prototypes," in IEEE Transactions on Multimedia, vol. 26, pp. 1346-1360, 2024, doi: 10.1109/TMM.2023.3280011. [2] W. Xie, Y. Li, Q. He, W. Cao, Few-shot class-incremental audio classification via discriminative prototype learning, Expert Systems With Applications, 2023, vol. 225, 120044, pp. 1-13. [3] W. Xie, Y. Li, Q. He, W. Cao, T. Virtanen, Few-shot class-incremental audio classification using adaptively-refined prototypes, INTERSPEECH, 2023, pp. 301-305. 在线链接:https://www.isca-speech.org/archive/interspeech_2023/xie23b_interspeech.html [4] Y. Li, W. Cao, J. Li, W. Xie, Q. He, Few-shot class-incremental audio classification using stochastic classifier, INTERSPEECH, 2023, pp. 4174-4178. 在线链接:https://www.isca-speech.org/archive/interspeech_2023/li23w_interspeech.html [5] Y. Li, J. Li, Y. Si, J. Tan and Q. He, "Few-Shot Class-Incremental Audio Classification With Adaptive Mitigation of Forgetting and Overfitting," in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 32, pp. 2297-2311, 2024, doi: 10.1109/TASLP.2024.3385287. 合成并重组后的完整数据集存储路径如下: cd /data/datasets/FSD-MIX-CLIPS-for_FSCIL <pre> dataset_root ├── vocab.json # 存储89个类别的标签名称 │ ├── FSD_MIX_SED.annotations # 原始声音事件检测(SED)数据集的标注信息 │ ├── FSD_MIX_SED.source # 原始SED数据集的合成原始素材 │ ├── FSD_MIX_SED.audio # 本地合成的SED数据集音频文件 │ ├── FSD_MIX_CLIPS.annotations # 原始片段标注信息 │ ├── FSD_MIX_CLIPS_data # 经标注修正后的合成片段数据集 │ ├── openl3 # 数据集的openl3特征文件 │ │ ├── base │ │ │ ├── train │ │ │ │ └── soundscape_205038_327222_1642.pkl │ │ │ ├── val │ │ │ └── test │ │ ├── val │ │ ├── test │ │ └── full_filelist # 完整数据集读取路径与标签字典文件 │ └── audio # 数据集音频样本文件 │ ├── base │ │ ├── train │ │ │ └── soundscape_205038_327222_1642.wav │ │ ├── val │ │ └── test │ ├── val │ ├── test │ └── full_filelist # 完整数据集读取路径与标签字典文件 │ └── FSD_MIX_CLIPS.annotations_revised # 经修正的片段标注信息(已剔除1485个重复样本) ├── base_train.csv ├── base_val.csv ├── base_test.csv ├── novel_val.csv ├── novel_test.csv # 我们使用上述6个修正后的文件来合成FSC-89数据集 ├── single_label_meta # 移除多标签样本信息后的6个CSV文件 └── FSC-89-meta # 基于single_label_meta采样样本数量,并按照FSCIL任务设置合并多个文件得到的元数据 ├── huge # 对应方案1的实验用重组数据集元文件 │ ├── Fsc89-huge-fsci_train.csv # 包含新旧类别的训练样本,区分新旧类别的样本数量 │ ├── Fsc89-huge-fsci_val.csv # 仅包含基础类别的验证样本信息 │ └── Fsc89-huge-fsci_test.csv # 包含新旧类别的测试样本信息,样本数量均衡 └── mini # 对应方案2的实验用重组数据集元文件 ├── Fsc89-mini-fsci_train.csv # 包含新旧类别的训练样本,区分新旧类别的样本数量 ├── Fsc89-mini-fsci_val.csv # 仅包含基础类别的验证样本信息 └── Fsc89-mini-fsci_test.csv # 包含新旧类别的测试样本信息,样本数量均衡 </pre> ### 重组后的CSV元文件格式规范如下: data_folder FSD_MIX_SED_filename start_time label 0 base/train soundscape_195781.wav 7.75 0 1 base/train soundscape_30977.wav 1.06 0 2 base/train soundscape_10404.wav 0.09 0 - 其中`data_folder`为样本所在的子文件夹路径。 样本读取示例代码如下: python meta_info = pd.read_csv(...) data_type = 'audio' # 或 'openl3' data_dir = '/data/datasets/FSD-MIX-CLIPS-for_FSCIL/FSD-MIX_CLIPS_data' start_sample = int(meta_info['start_time'][i] * 44100) filename = meta_info['FSD_MIX_SED_filename'][i].replace('.wav', '_' + str(start_sample) + '.wav') path = os.path.join(data_dir, data_type, filename) 需要注意的是,上述所有6个重组元文件均为随机筛选得到,每次运行筛选程序得到的文件信息可能存在差异。因此,为稳妥起见,请将`FSC-89-meta`文件夹复制到您自己的项目目录中;音频数据可无需复制,因为可通过重新合成得到。 如果您需要下载本数据集,请首先下载所有元数据部分。对于数据文件部分,本数据集采用以下命令进行分卷压缩: tar cvzf - DATASET_PATH | split -b 3000m -d - DATASET_NAME.tar.gz 分卷压缩包的解压步骤如下: cat DATASET_NAME.tar.gz* > DATASET_NAME.tar.gz tar xvzf DATASET_NAME.tar.gz ### 数据读取示例脚本 我们提供了一个基础脚本,用于展示如何按照不同设置读取数据: 首先将脚本文件`/data/datasets/FSD-MIX-CLIPS-for_FSCIL/load_fsc_89_data.py`复制到您的项目目录中并运行: python load_fsc_89_data.py --metapath /data/datasets/FSD-MIX-CLIPS-for_FSCIL/FSD_MIX_CLIPS.annotations_revised/FSC-89-meta --datapath /data/datasets/FSD-MIX-CLIPS-for_FSCIL/FSD-MIX_CLIPS_data --data_type audio --setup mini - 默认情况下,脚本将加载小批量数据集以节省训练时间,若需使用大规模数据集,只需将`--setup`参数设置为`huge`即可。 - 运行以下脚本可检查元数据与数据样本是否完整: python check_meta_data.py
提供机构:
maas
创建时间:
2023-12-30
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作