five

IMA++: ISIC Archive Multi-Annotator Dermoscopic Skin Lesion Segmentation Dataset

收藏
Zenodo2026-03-07 更新2026-05-26 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.14201692
下载链接
链接失效反馈
官方服务:
资源简介:
Overview The IMA++ dataset is the largest publicly available multi-annotator skin lesion segmentation (SLS) dataset, collected from the ISIC Archive to facilitate skin lesion image segmentation research. It contains 17,684 segmentation masks spanning 14,967 dermoscopic images, where 2,394 dermoscopic images have 2-5 segmentations per image from the ISIC Archive, annotated by 16 distinct annotators, with at least one annotation per image. The dataset captures a wide range of segmentation styles influenced by annotator expertise, tools used, and manual review processes, making it a valuable resource for developing and evaluating SLS models. Quick Start Download and unzip segs.zip from this repository. Download the corresponding 14,967 ISIC images via the dedicated IMA++ collection on the ISIC Archive (one-click download via Actions → Download Collection). Alternatively, download images by ISIC IDs using the ISIC API Image Downloader. Use seg_metadata.csv and img_metadata.csv (joinable on the ISIC_id / isic_id column) to link masks to images and clinical metadata. Files segs.zip: A ZIP archive of the 22,472 segmentation masks. This includes 17,684 segmentation masks and the output of two consensus algorithms (STAPLE and majority voting), bringing the number of segmentations up to 22, 472. seg_metadata.csv: A CSV file containing the metadata for all 22,472 segmentation masks. Columns: ISIC_id, img_filename, seg_filename, annotator (A00–A15), tool (T1–T3), skill_level (S1–S2), mskObjectID, mask_md5. img_metadata.csv: A CSV file containing the metadata for all the 14,967 skin lesion images.  Columns include: isic_id, age_approx, anatom_site_general, benign_malignant, diagnosis_1 through diagnosis_5, sex, pixels_x, pixels_y, among others. seg_metadata_multiannotator_subset.csv: A CSV file containing the metadata for only the multi-annotator subset of IMA++ (i.e., 2,394 images with multiple segmentations per image). iaa_metrics_pairwise.csv: A CSV file with the inter-annotator agreement (IAA) metrics calculated for all mask pairs for all the images in the multi-annotator subset of IMA++. Includes overlap metrics (Dice, Jaccard) and boundary metrics (HD, HD95, ASSD) along with their normalized variants (nHD, nHD95, nASSD). iaa_metrics_image.csv: A CSV file with the pairwise IAA metrics averaged per image. splits/{train,val,test}.csv: Standardized training (1,675), validation (240), and testing (479) partitions for the multi-annotator subset of IMA++, stratified by segmentation count per image and inter-annotator agreement level. Annotation Factors Each segmentation mask is associated with three annotation factors: Annotator (A00–A15): 16 anonymized annotators, sorted by number of segmentations produced. Tool: T1: Manual polygon tracing, a human expert places polyline control points along the lesion border. T2: Semi-automated flood-fill, a human expert provides a seed point and tunes flood-fill parameters, followed by morphological filtering. T3: Fully-automated algorithm, an automated segmentation algorithm generates the mask, reviewed and accepted by a human expert. Skill level: S1: Expert reviewer. S2: Novice reviewer. Key Features Inter-Annotator Variability: The dataset captures a wide range of segmentation styles, reflecting differences in annotator expertise, tools used (T1, T2, T3), and manual review processes (S1, S2). Realistic multi-annotator scenario: Unlike most medical image segmentation datasets where every image is segmented by every annotator (i.e., a complete bipartite graph between annotators and images), IMA++ features an incomplete bipartite graph (i.e., every image is segmented by at least one annotator, but not all images are segmented by all annotators). This setup simulates real-world annotation scenarios where multiple annotators contribute to a subset of images. Large-Scale Multi-Annotator Data: With 17,684 image-mask pairs from 16 annotators, IMA++ is the largest publicly available multi-annotator skin lesion segmentation dataset, enabling robust analysis of annotator preferences and segmentation styles. Tool-Specific Segmentation Styles: The dataset allows for the exploration of tool-specific segmentation styles, as demonstrated in the segmentation style discovery paper, where differences among the three annotation tools (T1, T2, T3) were learned and captured by the segmentation model. Consensus Masks: For the 2,394 images with multiple segmentations, two consensus masks are provided: STAPLE and majority voting. Citation If you use the IMA++ dataset in your research, please cite the following papers: Abhishek, K., Kawahara, J., Hamarneh, G. (2025). IMA++: ISIC Archive Multi-Annotator Dermoscopic Skin Lesion Segmentation Dataset. arXiv preprint arXiv:2512.21472, pages 1–11. https://doi.org/10.48550/arXiv.2512.21472 Abhishek, K., Kawahara, J., Hamarneh, G. (2025). What Can We Learn from Inter-Annotator Variability in Skin Lesion Segmentation?. In: Medical Image Computing and Computer-Assisted Intervention (MICCAI) ISIC Skin Image Analysis Workshop (MICCAI ISIC). MICCAI 2025. Lecture Notes in Computer Science, vol 16149, pages 23–33. Springer, Cham. https://doi.org/10.1007/978-3-032-05825-6_3 Abhishek, K., Kawahara, J., Hamarneh, G. (2025). Segmentation Style Discovery: Application to Skin Lesion Images. In: Medical Image Computing and Computer-Assisted Intervention (MICCAI) ISIC Skin Image Analysis Workshop (MICCAI ISIC). MICCAI 2024. Lecture Notes in Computer Science, vol 15274, pages 24–34. Springer, Cham. https://doi.org/10.1007/978-3-031-77610-6_3   The BibTeX entries for these papers are: @Article{abhishek2025imaplusplus,    author = {Abhishek, Kumar and Kawahara, Jeremy and Hamarneh, Ghassan},    title = {{IMA++}: {ISIC Archive} Multi-Annotator Dermoscopic Skin Lesion Segmentation Dataset},    journal = {arXiv preprint arXiv:2512.21472},    year = {2025},    doi = {https://doi.org/10.48550/arXiv.2512.21472},    url = {https://arxiv.org/abs/2512.21472},    publisher = {arXiv},    pages = {1--11}} @InProceedings{abhishek2025what,    author = {Abhishek, Kumar and Kawahara, Jeremy and Hamarneh, Ghassan},    title = {What Can We Learn from Inter-Annotator Variability in Skin Lesion Segmentation?},    booktitle = {Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI) ISIC Skin Image Analysis Workshop},    pages = {23--33},    year = {2025},    doi = {https://doi.org/10.1007/978-3-032-05825-6_3},    url = {https://link.springer.com/chapter/10.1007/978-3-032-05825-6_3},    publisher = {Springer Nature Switzerland},    address = {Cham},    isbn = {9783032058256}}@InProceedings{abhishek2025segmentation,    author = {Abhishek, Kumar and Kawahara, Jeremy and Hamarneh, Ghassan},    title = {Segmentation Style Discovery: Application to Skin Lesion Images},    booktitle = {Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI) ISIC Skin Image Analysis Workshop},    pages = {24--34},    year = {2025},    doi = {https://doi.org/10.1007/978-3-031-77610-6_3},    url = {https://link.springer.com/chapter/10.1007/978-3-031-77610-6_3},    publisher = {Springer Nature Switzerland},    address = {Cham},    isbn = {9783031776106}} License The IMA++ dataset is made available under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) license. Additionally, since these annotations are collected from the ISIC Archive, by using these annotations, you acknowledge and confirm your compliance with their Terms of Use.
提供机构:
Zenodo
创建时间:
2025-12-01
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作