Name: jiangab/RMIS
Creator: jiangab
Published: 2026-04-19 16:30:13
License: 暂无描述

下载链接：

https://hf-mirror.com/datasets/jiangab/RMIS

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: mit dataset_info: - config_name: dcase20 features: - name: audio dtype: audio: decode: false - name: file_name dtype: string - name: mt dtype: string - name: sec dtype: int64 - name: domain dtype: int64 - name: status dtype: int64 - name: attri dtype: string - name: num_frames dtype: int64 - name: sample_rate dtype: int64 - name: dur dtype: float64 splits: - name: ad num_bytes: 19107440765 num_examples: 54254 download_size: 17719359506 dataset_size: 19107440765 - config_name: dcase21 features: - name: audio dtype: audio: decode: false - name: file_name dtype: string - name: mt dtype: string - name: sec dtype: int64 - name: domain dtype: int64 - name: status dtype: int64 - name: attri dtype: string - name: num_frames dtype: int64 - name: sample_rate dtype: int64 - name: dur dtype: float64 splits: - name: ad num_bytes: 19054676901 num_examples: 59504 download_size: 19049344828 dataset_size: 19054676901 - config_name: dcase22 features: - name: audio dtype: audio: decode: false - name: file_name dtype: string - name: mt dtype: string - name: sec dtype: int64 - name: domain dtype: int64 - name: status dtype: int64 - name: attri dtype: string - name: num_frames dtype: int64 - name: sample_rate dtype: int64 - name: dur dtype: float64 splits: - name: ad num_bytes: 16137662528 num_examples: 50399 download_size: 16134503676 dataset_size: 16137662528 - config_name: dcase23 features: - name: audio dtype: audio: decode: false - name: file_name dtype: string - name: mt dtype: string - name: sec dtype: int64 - name: domain dtype: int64 - name: status dtype: int64 - name: attri dtype: string - name: num_frames dtype: int64 - name: sample_rate dtype: int64 - name: dur dtype: float64 splits: - name: ad num_bytes: 5809715627 num_examples: 16800 download_size: 5800741956 dataset_size: 5809715627 - config_name: dcase24 features: - name: audio dtype: audio: decode: false - name: file_name dtype: string - name: mt dtype: string - name: sec dtype: int64 - name: domain dtype: int64 - name: status dtype: int64 - name: attri dtype: string - name: num_frames dtype: int64 - name: sample_rate dtype: int64 - name: dur dtype: float64 splits: - name: ad num_bytes: 6639734770 num_examples: 19200 download_size: 5822888381 dataset_size: 6639734770 - config_name: dcase25 features: - name: audio dtype: audio: decode: false - name: file_name dtype: string - name: type dtype: string - name: mt dtype: string - name: sec dtype: int64 - name: attri dtype: string - name: num_frames dtype: int64 - name: sample_rate dtype: int64 - name: dur dtype: float64 - name: domain dtype: float64 - name: status dtype: float64 splits: - name: ad num_bytes: 6868439661 num_examples: 19500 download_size: 5967431295 dataset_size: 6868439661 - config_name: iica features: - name: audio dtype: audio: decode: false - name: file_name dtype: string - name: leak dtype: string - name: noise dtype: string - name: session dtype: int64 - name: knob dtype: string - name: mic dtype: string - name: status dtype: string - name: scene dtype: string - name: ori dtype: string - name: num_frames dtype: int64 - name: sample_rate dtype: int64 - name: dur dtype: float64 splits: - name: fd num_bytes: 16124113059 num_examples: 16792 download_size: 16122711233 dataset_size: 16124113059 - config_name: iiee features: - name: audio dtype: audio: decode: false - name: file_name dtype: string - name: scene dtype: string splits: - name: fd num_bytes: 629484750 num_examples: 2378 download_size: 629434101 dataset_size: 629484750 - config_name: mafaulda_sound features: - name: audio dtype: audio: decode: false - name: file_name dtype: string - name: scene dtype: string - name: ori dtype: string splits: - name: fd num_bytes: 975836503 num_examples: 1951 download_size: 975776677 dataset_size: 975836503 - config_name: mafaulda_vib features: - name: audio dtype: audio: decode: false - name: file_name dtype: string - name: scene dtype: string - name: ori dtype: string splits: - name: fd num_bytes: 5854911164 num_examples: 11706 download_size: 5854600849 dataset_size: 5854911164 - config_name: pu_cur features: - name: audio dtype: audio: decode: false - name: file_name dtype: string - name: scene dtype: string - name: ori dtype: string splits: - name: fd num_bytes: 2630029949 num_examples: 5120 download_size: 2629819315 dataset_size: 2630029949 - config_name: pu_vib features: - name: audio dtype: audio: decode: false - name: file_name dtype: string - name: scene dtype: string - name: ori dtype: string splits: - name: fd num_bytes: 1315434594 num_examples: 2560 download_size: 1314912195 dataset_size: 1315434594 - config_name: sdust_bearing features: - name: audio dtype: audio: decode: false - name: file_name dtype: string - name: scene dtype: string - name: ori dtype: string splits: - name: fd num_bytes: 4682987322 num_examples: 9144 download_size: 4682878349 dataset_size: 4682987322 - config_name: sdust_gear features: - name: audio dtype: audio: decode: false - name: file_name dtype: string - name: scene dtype: string - name: ori dtype: string splits: - name: fd num_bytes: 3073137846 num_examples: 6000 download_size: 3072881319 dataset_size: 3073137846 - config_name: umged_cur features: - name: audio dtype: audio: decode: false - name: file_name dtype: string - name: scene_G dtype: string - name: scene_E dtype: string - name: status dtype: string - name: ori dtype: string - name: num_frames dtype: int64 - name: sample_rate dtype: int64 - name: dur dtype: float64 splits: - name: fd num_bytes: 43262276883 num_examples: 42240 download_size: 43260872646 dataset_size: 43262276883 - config_name: umged_sound features: - name: audio dtype: audio: decode: false - name: file_name dtype: string - name: scene_G dtype: string - name: scene_E dtype: string - name: status dtype: string - name: ori dtype: string - name: num_frames dtype: int64 - name: sample_rate dtype: int64 - name: dur dtype: float64 splits: - name: fd num_bytes: 21631135907 num_examples: 21120 download_size: 21630391086 dataset_size: 21631135907 - config_name: umged_vib features: - name: audio dtype: audio: decode: false - name: file_name dtype: string - name: scene_G dtype: string - name: scene_E dtype: string - name: status dtype: string - name: ori dtype: string - name: num_frames dtype: int64 - name: sample_rate dtype: int64 - name: dur dtype: float64 splits: - name: fd num_bytes: 64893392515 num_examples: 63360 download_size: 64891458435 dataset_size: 64893392515 - config_name: umged_vol features: - name: audio dtype: audio: decode: false - name: file_name dtype: string - name: scene_G dtype: string - name: scene_E dtype: string - name: status dtype: string - name: ori dtype: string - name: num_frames dtype: int64 - name: sample_rate dtype: int64 - name: dur dtype: float64 splits: - name: fd num_bytes: 43262276883 num_examples: 42240 download_size: 43260872593 dataset_size: 43262276883 - config_name: wtpg features: - name: audio dtype: audio: decode: false - name: file_name dtype: string - name: scene dtype: string - name: ori dtype: string splits: - name: fd num_bytes: 2546319308 num_examples: 4874 download_size: 2573285083 dataset_size: 2546319308 configs: - config_name: dcase20 data_files: - split: ad path: dcase20/ad-* - config_name: dcase21 data_files: - split: ad path: dcase21/ad-* - config_name: dcase22 data_files: - split: ad path: dcase22/ad-* - config_name: dcase23 data_files: - split: ad path: dcase23/ad-* - config_name: dcase24 data_files: - split: ad path: dcase24/ad-* - config_name: dcase25 data_files: - split: ad path: dcase25/ad-* - config_name: iica data_files: - split: fd path: iica/fd-* - config_name: iiee data_files: - split: fd path: iiee/fd-* - config_name: mafaulda_sound data_files: - split: fd path: mafaulda_sound/fd-* - config_name: mafaulda_vib data_files: - split: fd path: mafaulda_vib/fd-* - config_name: pu_cur data_files: - split: fd path: pu_cur/fd-* - config_name: pu_vib data_files: - split: fd path: pu_vib/fd-* - config_name: sdust_bearing data_files: - split: fd path: sdust_bearing/fd-* - config_name: sdust_gear data_files: - split: fd path: sdust_gear/fd-* - config_name: umged_cur data_files: - split: fd path: umged_cur/fd-* - config_name: umged_sound data_files: - split: fd path: umged_sound/fd-* - config_name: umged_vib data_files: - split: fd path: umged_vib/fd-* - config_name: umged_vol data_files: - split: fd path: umged_vol/fd-* - config_name: wtpg data_files: - split: fd path: wtpg/fd-* --- <h1 align="center"> RMIS Benchmark Datasets </h1> ## Introduction RMIS is a benchmark dataset collection for evaluating representation learning on **multi-modal industrial signals**. It brings together the datasets used in the RMIS benchmark, covering **anomaly detection** and **fault diagnosis** across four modalities: **sound, vibration, voltage, and current**. This Hugging Face repository mainly hosts the **benchmark datasets themselves**. If you are looking for the full benchmark codebase, evaluation pipeline, preprocessing details, or leaderboard, please refer to the [RMIS GitHub repository](https://github.com/jianganbai/RMIS). RMIS is closely related to **FISHER**: - **FISHER** is the foundation model proposed for industrial signal representation. - **RMIS** is the benchmark used to evaluate FISHER and other signal foundation models. - This Hugging Face repository hosts the **dataset side** of RMIS, while the GitHub repository hosts the **benchmark code and evaluation pipeline**. In the current release, the dataset includes **19 configurations**: - **Anomaly detection**: `dcase20`, `dcase21`, `dcase22`, `dcase23`, `dcase24`, `dcase25` - **Fault diagnosis**: `iica`, `iiee`, `mafaulda_sound`, `mafaulda_vib`, `pu_cur`, `pu_vib`, `sdust_bearing`, `sdust_gear`, `umged_cur`, `umged_sound`, `umged_vib`, `umged_vol`, `wtpg` ## What is included Each configuration corresponds to one benchmark subset. The exact schema varies by subset, but all configurations provide an `audio` column together with file-level metadata such as `file_name`, and task-specific annotations such as `status`, `scene`, `mt`, `ori`, `domain`, or related attributes. The split names follow the benchmark tasks: - `ad`: anomaly detection subsets - `fd`: fault diagnosis subsets Please note that this repository mainly serves as a **data hosting and distribution** endpoint. For RMIS-specific preprocessing, path organization, evaluation protocol, and model integration, please refer to the RMIS GitHub repository. ## Recommended usage in the RMIS project If you want to use these datasets inside the **RMIS benchmark workflow**, please first download the [RMIS GitHub repository](https://github.com/jianganbai/RMIS), install its dependencies, and then run the provided script from that repository. ```shell git clone https://github.com/jianganbai/RMIS.git cd RMIS/ pip install -r requirements.txt ``` Then use the following command to download and extract the Hugging Face data into RMIS-compatible local wav folders: ```shell [HF_ENDPOINT=https://hf-mirror.com] python -m utils.scripts.download_and_extract_hf_data \ --output_dir OUTPUT_DIR \ [--subset SUBSET [SUBSET ...]] \ [--remove_parquet_after_extract] \ [--force_reextract] ``` For example: ```shell python -m utils.scripts.download_and_extract_hf_data \ --output_dir datasets_hf \ --subset iiee mafaulda_sound \ --remove_parquet_after_extract ``` Here `--output_dir` is required, while `HF_ENDPOINT`, `--subset`, `--remove_parquet_after_extract`, and `--force_reextract` are optional. If `--subset` is omitted, the script processes all RMIS subsets. In the RMIS project, the Hugging Face route is mainly intended to treat this repository as a **cloud storage backend** and materialize the data into the same local wav-style directory layout used by the other RMIS download paths. ## Additional usage If you are interested in more customized workflows, you may also directly use the metadata and parquet assets attached to this Hugging Face repository to develop your own data loading, conversion, or preprocessing utilities. However, for most users who want to reproduce RMIS experiments, the recommended path is still: 1. Download the RMIS GitHub repository. 2. Follow the benchmark instructions there. 3. Use the provided Hugging Face download-and-extract script when you prefer the Hugging Face storage route. For more details about RMIS, including benchmark construction, evaluation, and usage, please refer to the [RMIS GitHub repository](https://github.com/jianganbai/RMIS). ## Acknowledgements RMIS is built from multiple public industrial signal datasets. We thank the original dataset creators for making these resources available. If you believe that any content in this repository infringes your rights, please contact us and we will address the issue promptly. ## Citation If you find RMIS useful, please cite the following paper. ```bibtex @article{fan2025fisher, title={FISHER: A Foundation Model for Multi-Modal Industrial Signal Comprehensive Representation}, author={Fan, Pingyi and Jiang, Anbai and Zhang, Shuwei and Lv, Zhiqiang and Han, Bing and Zheng, Xinhu and Liang, Wenrui and Li, Junjie and Zhang, Wei-Qiang and Qian, Yanmin and Chen, Xie and Lu, Cheng and Liu, Jia}, journal={arXiv preprint arXiv:2507.16696}, year={2025} } ```

应用场景：