five

UPM - DATASET

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/12806478
下载链接
链接失效反馈
官方服务:
资源简介:
Dataset Overview This dataset is associated with the article "Unveiling Scientific Articles from Paper Mills with Provenance Analysis." It is designed to support the development of new methods for identifying systematically produced articles. The dataset includes all image panels from the Stock Photo Paper Mill (SPP) [1] and its extended versions. Dataset Composition The SPP dataset comprises 121 biomedical articles focused on cancer types and cell tissue samples. Bik [1] has annotated instances of potentially similar images across these papers, and these annotations are publicly available on Bik's website [1] in spreadsheet format. To enhance the SPP dataset, we introduced distractor documents that do not contain known issues. We created two versions of the SPP extension to study the challenge across increasingly large sets of articles: Version 1 (v1): Includes 969 additional papers containing biomedical figures. Version 2 (v2): Expands further with 3,635 additional papers, similar in nature to those in the first version. Image Panel Distribution The following table shows the distribution of each image panel type after extraction from their original articles: Panel Type SPP Extended SPP (v1) Extended SPP (v2) Microscopy 925 4,227 14,083 Blots 278 1,298 9,810 Body Imaging 0 573 10,715 Graphs and Plots 1,317 3,620 9,879 Flow Cytometry 63 427 3,053 Total 2,583 10,145 47,540   Dataset Structure The dataset.zip file contains three directories, each corresponding to a different version of the dataset: spm/: Contains image panels from the original Stock Photo Paper Mill (SPM) set. extracted_panels/: Includes panels related to the Extended SPP (v1). annotated_panels/: Contains panels related to the Extended SPP (v2). Annotations The dataset includes two types of annotation files: document-level-annotation.json: This file provides annotations detailing how each article reuses content from other articles. image-level-annotation.json: This file includes annotations about groups of images that share similar content. Please refer to these files for detailed information on the dataset's contents and the relationships between the images and articles.   References: [1] Bik E. The Stock Photo Paper Mill; 2020. Available from https://scienceintegritydigest.com/2020/07/05/the-stock-photo-paper-mill/
创建时间:
2024-08-22
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作