PEMDefect-1107
收藏Figshare2025-12-21 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/PEM-DefectLoc-9K/30928907
下载链接
链接失效反馈官方服务:
资源简介:
PEMDefect-1107 is a curated grayscale image dataset for single-class defect localization in proton-exchange membrane (PEM) fuel cell materials. The dataset is designed for bounding-box–based defect detection and supports research in materials-focused computer vision, data-centric machine learning, and defect localization under data-scarce conditions.The dataset contains 1,107 images in total and is organized under a single root directory with explicit train, validation, and test splits. The splits consist of 1,045 training images, 31 validation images, and 31 test images. Each split is stored in a dedicated subdirectory and follows a split-aware directory structure to prevent data leakage.All bounding-box annotations are provided in YOLO format, with one label file per image. This format ensures a strict one-to-one correspondence between images and annotations and avoids centralized annotation mismatch issues. A dataset configuration file (dataset.yaml) is included to define split paths and class metadata, enabling immediate use in standard object detection pipelines.All images are provided in grayscale, resized to a fixed resolution of 640 × 640 pixels, and stored in PNG format. A consistent, physics-aware preprocessing pipeline was applied across the dataset, including conversion from microscopy acquisition formats, deterministic removal of scale bars via cropping, and global intensity normalization. Scale bars were removed to prevent shortcut learning from acquisition metadata and to ensure models learn from intrinsic material structure rather than imaging artifacts.The train/validation/test split was finalized prior to any data augmentation, ensuring strict separation between splits. Data augmentation was applied exclusively to the training set, while validation and test images remain unaugmented, enabling unbiased and reproducible evaluation of defect localization performance.The dataset underwent comprehensive integrity and validation checks, including verification of image–annotation correspondence, geometric consistency of bounding boxes, absence of truncation artifacts, detection of duplicate images, and confirmation of split integrity with no leakage across subsets. All hard validation criteria were satisfied in the final release.PEMDefect-1107 is suitable for training and benchmarking defect localization models such as Faster R-CNN and RetinaNet (via standard annotation conversion), as well as for studying data-centric effects, augmentation strategies, and robustness in materials-oriented computer vision tasks.
创建时间:
2025-12-21



