IPATH Dataset: 45,609 Curated Image-Text Pairs for Histopathology Applications
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14278845
下载链接
链接失效反馈官方服务:
资源简介:
Recent advancements in artificial intelligence (AI) have enabled the identification of patterns in pathology images, improving diagnostic accuracy and decision support systems. However, progress has been limited due to the lack of publicly available medical images. To address this scarcity, we explore Instagram as a novel source of pathology images with expert annotations. We curated the IPATH dataset from Instagram, comprising 45,609 pathology image-text pairs, using a combination of classifiers, large language models, and manual filtering. To demonstrate the value of this dataset, we developed a multimodal AI model called IP-CLIP by fine-tuning the pre-trained CLIP model using the IPATH dataset. IP-CLIP outperforms the original CLIP model in classifying new pathology images on two downstream tasks—zero-shot classification and linear probing—using two external histopathology datasets. These results surpass the CLIP baseline model and demonstrate the effectiveness of the IPATH dataset, highlighting the potential of social media data to advance AI models for medical image classification.
创建时间:
2024-12-17



