UPM - DATASET
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/12806478
下载链接
链接失效反馈官方服务:
资源简介:
Dataset Overview
This dataset is associated with the article "Unveiling Scientific Articles from Paper Mills with Provenance Analysis." It is designed to support the development of new methods for identifying systematically produced articles. The dataset includes all image panels from the Stock Photo Paper Mill (SPP) [1] and its extended versions.
Dataset Composition
The SPP dataset comprises 121 biomedical articles focused on cancer types and cell tissue samples. Bik [1] has annotated instances of potentially similar images across these papers, and these annotations are publicly available on Bik's website [1] in spreadsheet format.
To enhance the SPP dataset, we introduced distractor documents that do not contain known issues. We created two versions of the SPP extension to study the challenge across increasingly large sets of articles:
Version 1 (v1): Includes 969 additional papers containing biomedical figures.
Version 2 (v2): Expands further with 3,635 additional papers, similar in nature to those in the first version.
Image Panel Distribution
The following table shows the distribution of each image panel type after extraction from their original articles:
Panel Type
SPP
Extended SPP (v1)
Extended SPP (v2)
Microscopy
925
4,227
14,083
Blots
278
1,298
9,810
Body Imaging
0
573
10,715
Graphs and Plots
1,317
3,620
9,879
Flow Cytometry
63
427
3,053
Total
2,583
10,145
47,540
Dataset Structure
The dataset.zip file contains three directories, each corresponding to a different version of the dataset:
spm/: Contains image panels from the original Stock Photo Paper Mill (SPM) set.
extracted_panels/: Includes panels related to the Extended SPP (v1).
annotated_panels/: Contains panels related to the Extended SPP (v2).
Annotations
The dataset includes two types of annotation files:
document-level-annotation.json: This file provides annotations detailing how each article reuses content from other articles.
image-level-annotation.json: This file includes annotations about groups of images that share similar content.
Please refer to these files for detailed information on the dataset's contents and the relationships between the images and articles.
References:
[1] Bik E. The Stock Photo Paper Mill; 2020. Available from https://scienceintegritydigest.com/2020/07/05/the-stock-photo-paper-mill/
创建时间:
2024-08-22



