ARCA23K
收藏Mendeley Data2024-05-10 更新2024-06-27 收录
下载链接:
https://zenodo.org/records/5117901
下载链接
链接失效反馈官方服务:
资源简介:
ARCA23K is a dataset of labelled sound events created to investigate real-world label noise. It contains 23,727 audio clips originating from Freesound, and each clip belongs to one of 70 classes taken from the AudioSet ontology. The dataset was created using an entirely automated process with no manual verification of the data. For this reason, many clips are expected to be labelled incorrectly. In addition to ARCA23K, this release includes a companion dataset called ARCA23K-FSD, which is a single-label subset of the FSD50K dataset. ARCA23K-FSD contains the same sound classes as ARCA23K and the same number of audio clips per class. As it is a subset of FSD50K, each clip and its label have been manually verified. Note that only the ground truth data of ARCA23K-FSD is distributed in this release. To download the audio clips, please visit the Zenodo page for FSD50K. A paper has been published detailing how the dataset was constructed. See the Citing section below. The source code used to create the datasets is available: https://github.com/tqbl/arca23k-dataset Characteristics ARCA23K(-FSD) is divided into: A training set containing 17,979 clips (39.6 hours for ARCA23K). A validation set containing 2,264 clips (5.0 hours). A test test containing 3,484 clips (7.3 hours). There are 70 sound classes in total. Each class belongs to the AudioSet ontology. Each audio clip was sourced from the Freesound database. Other than format conversions (e.g. resampling), the audio clips have not been modified. The duration of the audio clips varies from 0.3 seconds to 30 seconds. All audio clips are mono 16-bit WAV files sampled at 44.1 kHz. Based on listening tests (details in paper), 46.4% of the training examples are estimated to be labelled incorrectly. Among the incorrectly-labelled examples, 75.9% are estimated to be out-of-vocabulary. Sound Classes The list of sound classes is given below. They are grouped based on the top-level superclasses of the AudioSet ontology. Music Acoustic guitar Bass guitar Bowed string instrument Crash cymbal Electric guitar Gong Harp Organ Piano Rattle (instrument) Scratching (performance technique) Snare drum Trumpet Wind chime Wind instrument, woodwind instrument Sounds of things Boom Camera Coin (dropping) Computer keyboard Crack Dishes, pots, and pans Drawer open or close Drill Gunshot, gunfire Hammer Keys jangling Knock Microwave oven Printer Sawing Scissors Skateboard Slam Splash, splatter Squeak Tap Thump, thud Toilet flush Train Water tap, faucet Whoosh, swoosh, swish Writing Zipper (clothing) Natural sounds Crackle Stream Waves, surf Wind Human sounds Burping, eructation Chewing, mastication Child speech, kid speaking Clapping Cough Crying, sobbing Fart Female singing Female speech, woman speaking Finger snapping Giggle Male speech, man speaking Run Screaming Walk, footsteps Animal Bark Cricket Livestock, farm animals, working animals Meow Rattle Source-ambiguous sounds Crumpling, crinkling Crushing Tearing License and Attribution This release is licensed under the Creative Commons Attribution 4.0 International License. The audio clips distributed as part of ARCA23K were sourced from Freesound and have their own Creative Commons license. The license information and attribution for each audio clip can be found in ARCA23K.metadata/train.json, which also includes the original Freesound URLs. The files under ARCA23K-FSD.ground_truth/ are an adaptation of the ground truth data provided as part of FSD50K, which is licensed under the Creative Commons Attribution 4.0 International License. The curators of FSD50K are Eduardo Fonseca, Xavier Favory, Jordi Pons, Mercedes Collado, Ceren Can, Rachit Gupta, Javier Arredondo, Gary Avendano, and Sara Fernandez. Citing If you wish to cite this work, please cite the following paper: T. Iqbal, Y. Cao, A. Bailey, M. D. Plumbley, and W. Wang, “ARCA23K: An audio dataset for investigating open-set label noise”, in Proceedings of the Detection and Classification of Acoustic Scenes and Events 2021 Workshop (DCASE2021), 2021, Barcelona, Spain, pp. 201–205. BibTeX: @inproceedings{Iqbal2021,
author = {Iqbal, T. and Cao, Y. and Bailey, A. and Plumbley, M. D. and Wang, W.},
title = {{ARCA23K}: An audio dataset for investigating open-set label noise},
booktitle = {Proceedings of the Detection and Classification of Acoustic Scenes and Events 2021 Workshop (DCASE2021)},
pages = {201--205},
year = {2021},
address = {Barcelona, Spain},
}
ARCA23K是一款专为探究现实世界标签噪声而构建的标注音频事件数据集。该数据集包含23727段源自Freesound的音频片段,所有片段均隶属于从AudioSet本体(AudioSet ontology)中选取的70个类别。本数据集采用全自动化流程构建,未对数据进行任何人工核验,因此预计存在大量标注错误的片段。
除ARCA23K外,本次发布还包含一款配套数据集ARCA23K-FSD,其为FSD50K数据集的单标签子集。ARCA23K-FSD与ARCA23K拥有完全一致的声音类别,且每个类别的音频片段数量相同。由于其为FSD50K的子集,所有片段及其标注均经过人工核验。请注意,本次发布仅分发ARCA23K-FSD的真值数据,若需下载音频片段,请访问FSD50K的Zenodo页面。
已有论文详细阐述了本数据集的构建过程,详见下文的引用章节。用于构建该数据集的源代码可通过以下链接获取:https://github.com/tqbl/arca23k-dataset
### 数据集特性
ARCA23K(含ARCA23K-FSD)的数据集划分如下:
- 训练集:包含17979段片段(ARCA23K对应时长为39.6小时)
- 验证集:包含2264段片段(对应时长5.0小时)
- 测试集:包含3484段片段(对应时长7.3小时)
该数据集总计涵盖70个声音类别,所有类别均隶属于AudioSet本体,所有音频片段均源自Freesound数据库。除格式转换(如重采样)外,未对音频片段进行任何修改。音频片段的时长范围为0.3秒至30秒,全部为采样率44.1kHz的单声道16位WAV文件。
根据听音测试(详细信息见论文),估计有46.4%的训练样本存在标注错误。在标注错误的样本中,估计有75.9%属于超出词表范围的样本。
### 声音类别
以下为声音类别列表,按AudioSet本体的顶级超类进行分组:
1. **音乐(Music)**:原声吉他(Acoustic guitar)、电贝斯吉他(Bass guitar)、弓弦弦乐器(Bowed string instrument)、碎音钹(Crash cymbal)、电吉他(Electric guitar)、铜锣(Gong)、竖琴(Harp)、管风琴(Organ)、钢琴(Piano)、摇奏乐器(Rattle (instrument))、刮擦演奏技法(Scratching (performance technique))、小军鼓(Snare drum)、小号(Trumpet)、风铃(Wind chime)、木管乐器(Wind instrument, woodwind instrument)
2. **物品声响(Sounds of things)**:隆隆声(Boom)、相机声响(Camera)、硬币掉落声(Coin (dropping))、键盘敲击声(Computer keyboard)、爆裂声(Crack)、餐具与锅具声响(Dishes, pots, and pans)、抽屉开合声(Drawer open or close)、钻孔声(Drill)、枪声(Gunshot, gunfire)、锤击声(Hammer)、钥匙晃动声(Keys jangling)、敲击声(Knock)、微波炉声响(Microwave oven)、打印机声响(Printer)、锯切声(Sawing)、剪刀开合声(Scissors)、滑板滑行声(Skateboard)、重击声(Slam)、飞溅声(Splash, splatter)、吱吱声(Squeak)、轻敲声(Tap)、砰砰重击声(Thump, thud)、马桶冲水声(Toilet flush)、火车声响(Train)、水龙头流水声(Water tap, faucet)、呼啸声(Whoosh, swoosh, swish)、书写声(Writing)、衣物拉链声(Zipper (clothing))
3. **自然声响(Natural sounds)**:噼啪声(Crackle)、溪流声(Stream)、海浪与拍岸浪声(Waves, surf)、风声(Wind)
4. **人类声响(Human sounds)**:打嗝声(Burping, eructation)、咀嚼声(Chewing, mastication)、儿童语音(Child speech, kid speaking)、拍手声(Clapping)、咳嗽声(Cough)、哭泣与抽噎声(Crying, sobbing)、放屁声(Fart)、女性歌唱(Female singing)、女性语音(Female speech, woman speaking)、打响指声(Finger snapping)、咯咯笑声(Giggle)、男性语音(Male speech, man speaking)、跑步声(Run)、尖叫声(Screaming)、行走与脚步声(Walk, footsteps)
5. **动物声响(Animal)**:犬吠声(Bark)、蟋蟀鸣声(Cricket)、家畜、农场动物与役用动物(Livestock, farm animals, working animals)、猫叫声(Meow)、摇奏声响(Rattle)
6. **源歧义声响(Source-ambiguous sounds)**:揉皱与沙沙声(Crumpling, crinkling)、压碎声(Crushing)、撕裂声(Tearing)
### 许可与归属声明
本次发布采用知识共享署名4.0国际许可协议(Creative Commons Attribution 4.0 International License)进行许可。ARCA23K所分发的音频片段源自Freesound,各自带有其专属的知识共享许可。各音频片段的许可信息与归属标注可在`ARCA23K.metadata/train.json`中查看,该文件同时包含原始Freesound链接。`ARCA23K-FSD.ground_truth/`下的文件改编自FSD50K提供的真值数据,FSD50K采用知识共享署名4.0国际许可协议进行许可。FSD50K的策展人为Eduardo Fonseca、Xavier Favory、Jordi Pons、Mercedes Collado、Ceren Can、Rachit Gupta、Javier Arredondo、Gary Avendano与Sara Fernandez。
### 引用方式
若需引用本工作,请引用以下论文:T. Iqbal、Y. Cao、A. Bailey、M. D. Plumbley与W. Wang,《ARCA23K:用于研究开放集标签噪声的音频数据集》,收录于2021年声学场景检测与分类研讨会(DCASE2021)论文集,西班牙巴塞罗那,2021年,第201–205页。
BibTeX引用格式如下:
bibtex
@inproceedings{Iqbal2021,
author = {Iqbal, T. and Cao, Y. and Bailey, A. and Plumbley, M. D. and Wang, W.},
title = {{ARCA23K}: An audio dataset for investigating open-set label noise},
booktitle = {Proceedings of the Detection and Classification of Acoustic Scenes and Events 2021 Workshop (DCASE2021)},
pages = {201--205},
year = {2021},
address = {Barcelona, Spain},
}
创建时间:
2023-06-28
搜集汇总
数据集介绍

背景与挑战
背景概述
ARCA23K是一个包含23,727个音频片段的数据集,用于研究标签噪声问题,涵盖70个AudioSet类别,完全自动化生成且未经验证,预计46.4%的训练样本标签错误。配套数据集ARCA23K-FSD为手动验证的单标签子集,提供对比研究基础。
以上内容由遇见数据集搜集并总结生成



