PEM-DefectLoc-9K

Name: PEM-DefectLoc-9K
Creator: figshare
Published: 2025-12-25 17:48:40
License: 暂无描述

DataCite Commons2025-12-25 更新2026-04-25 收录

下载链接：

https://figshare.com/articles/dataset/PEM-DefectLoc-9K/30928907/1

下载链接

链接失效反馈

官方服务：

资源简介：

PEM-DefectLoc-9K is a machine-learning–ready scanning electron microscopy (SEM) image dataset curated for defect localization in polymer electrolyte membrane (PEM) fuel cell materials. The dataset is designed to support computer vision benchmarking, with an emphasis on supervised and self-supervised localization tasks and on analyzing the impact of data augmentation strategies in microscopy-based vision pipelines.The dataset is intended for spatial defect localization rather than defect classification or quantitative materials metrology.Dataset Files and StructureThe complete dataset is distributed across four files, which together constitute a single unified collection:<code>images_part01.zip</code> Contains 3,021 SEM images (grayscale PNG).<code>images_part02.zip</code> Contains 3,021 SEM images (grayscale PNG).<code>images_part03.zip</code> Contains 3,022 SEM images (grayscale PNG).<code>sem123_augmented/augmented_annotations.json</code> A single JSON file containing bounding-box annotations for all images across the three image archives.The three image archives collectively contain 9,064 SEM images. The division into multiple archives is purely for file-size management; the order of images across archives is not significant, and the archives do not represent independent subsets. All three image archives should be used together in conjunction with the annotation file.Annotation FormatAnnotations are provided in JSON format, with each image filename mapped to one or more bounding boxes enclosing defect regions. The annotation structure follows the schema:<pre><pre>{ "image_name.png": [ [x, y, width, height], [x, y, width, height] ], ... } </pre></pre>Bounding box coordinates are specified in pixel units relative to the corresponding image. Image filenames in the annotation file correspond exactly to the filenames contained within the image archives, independent of which archive an image resides in.Image Generation and AugmentationThe dataset originates from a set of base SEM images, each expanded using a fixed set of 43 physically plausible augmentations, resulting in 44 variants per base image. Augmentations were selected to reflect realistic SEM acquisition variability and include controlled variations such as blur, noise, intensity and contrast changes, resolution variation, and compression artifacts.All augmented images are included directly within the three image archives. Original images are not distributed as a separate subset.Preprocessing and Responsible Data ReleasePrior to release, all images were preprocessed to ensure responsible data sharing and to prevent leakage of acquisition-specific or proprietary information. Preprocessing steps included:Removal of embedded scale bars and acquisition overlaysCropping to exclude non-informative regionsAdjustment of bounding boxes to maintain spatial consistency after croppingThese steps preserve the semantic integrity of defect regions while enabling safe and reusable benchmarking.Intended UseThis dataset is intended for:Benchmarking defect localization modelsSupervised and self-supervised learning on SEM imageryStudying augmentation robustness and generalization in microscopy-based vision tasksThe dataset is not intended for defect classification benchmarks, scale-aware physical measurements, or quantitative materials property estimation.Usage NotesUsers should extract all three image archives into a single directory before use. The provided annotation file should be used jointly with the complete image set. The dataset is provided without accompanying code; users are expected to implement their own data loading, training, and evaluation pipelines based on the documented structure and annotation format.

提供机构：

figshare

创建时间：

2025-12-21

5,000+

优质数据集

54 个

任务类型

进入经典数据集