five

DCASE 2020 Challenge Task 2 Development Dataset

收藏
NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://zenodo.org/record/3678170
下载链接
链接失效反馈
官方服务:
资源简介:
Description This dataset is the "development dataset" for the DCASE 2020 Challenge Task 2 "Unsupervised Detection of Anomalous Sounds for Machine Condition Monitoring" [task description].  The data comprises parts of ToyADMOS and the MIMII Dataset consisting of the normal/anomalous operating sounds of six types of toy/real machines. Each recording is a single-channel (proximately) 10-sec length audio that includes both a target machine's operating sound and environmental noise. The following six types of toy/real machines are used in this task: Toy-car (ToyADMOS) Toy-conveyor (ToyADMOS) Valve (MIMII Dataset) Pump (MIMII Dataset) Fan (MIMII Dataset) Slide rail (MIMII Dataset)   Recording procedure The ToyADMOS consists of normal/anomalous operating sounds of miniature machines (toys) collected with four microphones, and the MIMII dataset consists of those of real-machines collected with eight microphones. Anomalous sounds in these datasets were collected by deliberately damaging target machines. For simplifying the task, we used only the first channel of multi-channel recordings; all recordings are regarded as single-channel recordings of a fixed microphone. The sampling rate of all signals has been downsampled to 16 kHz. From ToyADMOS, we used only IND-type data that contain the operating sounds of the entire operation (i.e., from start to stop) in a recording. We mixed a target machine sound with environmental noise, and only noisy recordings are provided as training/test data. For the details of the recording procedure, please refer to the papers of ToyADMOS and MIMII Dataset.   Data We first define two important terms in this task: Machine Type and Machine ID. Machine Type means the kind of machine, which in this task can be one of six: toy-car, toy-conveyor, valve, pump, fan, and slide rail. Machine ID is the identifier of each individual of the same type of machine, which in the training dataset can be of three or four. Each machine ID's dataset consists of (i) around 1,000 samples of normal sounds for training and (ii) 100-200 samples each of normal and anomalous sounds for the test. The given labels for each training/test sample are Machine Type, Machine ID, and condition (normal/anomaly). Machine Type information is given by directory name, and Machine ID and condition information are given by their respective file names.    Directory structure When you unzip the downloaded files from  Zenodo, you can see the following directory structure. As described in the previous section, Machine Type information is given by directory name, and Machine ID and condition information are given by file name, as: /dev_data /ToyCar /train (Only normal data for all Machine IDs are included.) /normal_id_01_00000000.wav ... /normal_id_01_00000999.wav /normal_id_02_00000000.wav ... /normal_id_04_00000999.wav /test (Normal and anomaly data for all Machine IDs are included.) /normal_id_01_00000000.wav ... /normal_id_01_00000349.wav /anomaly_id_01_00000000.wav ... /anomaly_id_01_00000263.wav /normal_id_02_00000000.wav ... /anomaly_id_04_00000264.wav /ToyConveyor (The other Machine Types have the same directory structure as ToyCar.) /fan /pump /slider /valve   The paths of audio files are: "/dev_data//train/normal_id__[0-9]+.wav" "/dev_data//test/normal_id__[0-9]+.wav" "/dev_data//test/anomaly_id__[0-9]+.wav" For example, the Machine Type and Machine ID of "/ToyCar/train/normal_id_01_00000000.wav" are "ToyCar" and "01", respectively, and its condition is normal. The Machine Type and Machine ID of "/fan/test/anomaly_id_00_00000000.wav" are "fan" and "00", respectively, and its condition is anomalous.   Baseline system A simple baseline system is available on the Github repository [URL]. The baseline system provides a simple entry-level approach that gives a reasonable performance in the dataset of Task 2. It is a good starting point, especially for entry-level researchers who want to get familiar with the anomalous-sound-detection task.   Conditions of use This dataset was created jointly by NTT Corporation and Hitachi, Ltd. and is available under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) license.   Publication If you use this dataset, please cite all the following three papers: Yuma Koizumi, Shoichiro Saito, Noboru Harada, Hisashi Uematsu, and Keisuke Imoto, "ToyADMOS: A Dataset of Miniature-Machine Operating Sounds for Anomalous Sound Detection," in Proc of Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2019. [pdf] Harsh Purohit, Ryo Tanabe, Kenji Ichige, Takashi Endo, Yuki Nikaido, Kaori Suefusa, and Yohei Kawaguchi, “MIMII Dataset: Sound Dataset for Malfunctioning Industrial Machine Investigation and Inspection,” in Proc. 4th Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE), 2019. [pdf] Yuma Koizumi, Yohei Kawaguchi, Keisuke Imoto, Toshiki Nakamura, Yuki Nikaido, Ryo Tanabe, Harsh Purohit, Kaori Suefusa, Takashi Endo, Masahiro Yasuda, and Noboru Harada, "Description and Discussion on DCASE2020 Challenge Task2: Unsupervised Anomalous Sound Detection for Machine Condition Monitoring," in Proc. 5th Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE), 2020. [pdf] Feedback If there is any problem, please contact us: Yuma Koizumi, koizumi.yuma@ieee.org Yohei Kawaguchi, yohei.kawaguchi.xk@hitachi.com Keisuke Imoto, keisuke.imoto@ieee.org
创建时间:
2022-05-24
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作