NMPA-MedDevice: A Multimodal Regulatory Dataset of 52,000 Chinese Medical Devices for Risk Classification Research
收藏NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://figshare.com/articles/dataset/NMPA-MedDevice_A_Multimodal_Regulatory_Dataset_of_43_000_Chinese_Medical_Devices_for_Risk_Classification_Research/31322272
下载链接
链接失效反馈官方服务:
资源简介:
NMPA-MedDevice is a regulatory dataset derived from China's National Medical Products Administration (NMPA) Unique Device Identification (UDI) registry (full release: July 2024). The release comprises four components:
Raw registry snapshot -- The frozen NMPA UDI registry as downloaded on 1 July 2024 (66,472 records, 47 structured fields).Cleaned text-and-metadata corpus -- 52,251 unique device records after deduplication, placeholder removal, language filtering, and near-duplicate removal, with deterministically derived risk class labels included.Curated image-linked subset -- 1,005 medical devices with product descriptions, regulatory metadata, verified risk classification labels (Class I = 39, Class II = 462, Class III = 504)External temporal validation set -- 300 devices from the NMPA weekly update (October--November 2025), providing an independent cross-temporal evaluation set.
### Risk Class Labels
Labels are deterministically derived from the ninth character of the NMPA registration number:
- `1` = Class I (low risk)
- `2` = Class II (moderate risk)
- `3` = Class III (high risk)
### Image Modality
Raw product images are not redistributed due to copyright restrictions. Instead, we provide:
Pre-extracted feature embeddings: BERT-base-Chinese [CLS] token embeddings (768 dimensions) andEfficientNet-B5 average-pool embeddings (2,048 dimensions) in HDF5 format.
Record--image linkage metadata (`matched_products.csv`).Best-effort image retrieval script(`scripts/image_retrieval.py`) for downloading images from publicly accessible manufacturer websites. Image reconstruction may degrade over time due to link rot.These precomputed embeddings enable full reproducibility of downstream classification experiments reported in the companion study (Han, Ceross & Bergmann, Expert Systems with Applications, 2026).
### File Structure
- Root -- Raw registry (`nmpa_full_66k.csv`), cleaned corpus (`nmpa_cleaned_52k.csv`), holdout split files, and README
- `labeled/` -- Curated subset (`sampled_data.csv`) and image matching index (`matched_products.csv`)
- `features/` -- Precomputed text and image embeddings (HDF5)
- `splits/` -- Cross-validation fold indices
- `external_validation/` -- External temporal validation set
- `scripts/` -- Preprocessing, label derivation, feature extraction, and image retrieval code
创建时间:
2026-02-12



