five

Paper2Fig100k dataset

收藏
Mendeley Data2024-05-10 更新2024-06-29 收录
下载链接:
https://zenodo.org/records/7299423
下载链接
链接失效反馈
资源简介:
Paper2Fig100k dataset A dataset with over 100k images of figures and text captions from research papers. Images of figures display diagrams, methodologies, and architectures of research papers in arXiv.org. We provide also text captions for each figure, and OCR detections and recognitions on the figures (bounding boxes and texts). The dataset structure consists of a directory called "figures" and two JSON files (train and test), that contain data from each figure. Each JSON object contains the following information about a figure: figure_id: Figure identification based on the arXiv identifier: <yymm>.<xxxxxx>-Figure<I>-<k>.png. captions: Text pairs extracted from the paper that relates to the figure. For instance, the actual caption of the figure or references to the figure in the manuscript. ocr_result: Result of performing OCR text recognition over the image. We provide a list of triplets (bounding box, confidence, text) present in the image. aspect: Aspect ratio of the image (H/W). Take a look at the OCR-VQGAN GitHub repository, which uses the Paper2Fig100k dataset to train an image encoder for figures and diagrams, that uses OCR perceptual loss to render clear and readable texts inside images. The dataset is explained in more detail in the paper OCR-VQGAN: Taming Text-within-Image Generation @WACV 2023 Paper abstract Synthetic image generation has recently experienced significant improvements in domains such as natural image or art generation. However, the problem of figure and diagram generation remains unexplored. A challenging aspect of generating figures and diagrams is effectively rendering readable texts within the images. To alleviate this problem, we present OCR-VQGAN, an image encoder, and decoder that leverages OCR pre-trained features to optimize a text perceptual loss, encouraging the architecture to preserve high-fidelity text and diagram structure. To explore our approach, we introduce the Paper2Fig100k dataset, with over 100k images of figures and texts from research papers. The figures show architecture diagrams and methodologies of articles available at arXiv.org from fields like artificial intelligence and computer vision. Figures usually include text and discrete objects, e.g., boxes in a diagram, with lines and arrows that connect them. We demonstrate the superiority of our method by conducting several experiments on the task of figure reconstruction. Additionally, we explore the qualitative and quantitative impact of weighting different perceptual metrics in the overall loss function.
作者:
Juan A. Rodríguez
开放时间:
2023-06-28
创建时间:
2023-06-28