NMPA-MedDevice: A Multimodal Regulatory Dataset of 52,000 Chinese Medical Devices for Risk Classification Research

NIAID Data Ecosystem2026-05-10 收录

下载链接：

https://figshare.com/articles/dataset/NMPA-MedDevice_A_Multimodal_Regulatory_Dataset_of_43_000_Chinese_Medical_Devices_for_Risk_Classification_Research/31322272

下载链接

链接失效反馈

官方服务：

资源简介：

NMPA-MedDevice is a regulatory dataset derived from China's National Medical Products Administration (NMPA) Unique Device Identification (UDI) registry (full release: July 2024). The release comprises four components: Raw registry snapshot -- The frozen NMPA UDI registry as downloaded on 1 July 2024 (66,472 records, 47 structured fields).Cleaned text-and-metadata corpus -- 52,251 unique device records after deduplication, placeholder removal, language filtering, and near-duplicate removal, with deterministically derived risk class labels included.Curated image-linked subset -- 1,005 medical devices with product descriptions, regulatory metadata, verified risk classification labels (Class I = 39, Class II = 462, Class III = 504)External temporal validation set -- 300 devices from the NMPA weekly update (October--November 2025), providing an independent cross-temporal evaluation set. ### Risk Class Labels Labels are deterministically derived from the ninth character of the NMPA registration number: - `1` = Class I (low risk) - `2` = Class II (moderate risk) - `3` = Class III (high risk) ### Image Modality Raw product images are not redistributed due to copyright restrictions. Instead, we provide: Pre-extracted feature embeddings: BERT-base-Chinese [CLS] token embeddings (768 dimensions) andEfficientNet-B5 average-pool embeddings (2,048 dimensions) in HDF5 format. Record--image linkage metadata (`matched_products.csv`).Best-effort image retrieval script(`scripts/image_retrieval.py`) for downloading images from publicly accessible manufacturer websites. Image reconstruction may degrade over time due to link rot.These precomputed embeddings enable full reproducibility of downstream classification experiments reported in the companion study (Han, Ceross & Bergmann, Expert Systems with Applications, 2026). ### File Structure - Root -- Raw registry (`nmpa_full_66k.csv`), cleaned corpus (`nmpa_cleaned_52k.csv`), holdout split files, and README - `labeled/` -- Curated subset (`sampled_data.csv`) and image matching index (`matched_products.csv`) - `features/` -- Precomputed text and image embeddings (HDF5) - `splits/` -- Cross-validation fold indices - `external_validation/` -- External temporal validation set - `scripts/` -- Preprocessing, label derivation, feature extraction, and image retrieval code

创建时间：

2026-02-12

5,000+

优质数据集

54 个

任务类型

进入经典数据集