未明确提及
收藏arXiv2023-09-04 更新2024-08-06 收录
下载链接:
http://arxiv.org/abs/2309.01674v1
下载链接
链接失效反馈官方服务:
资源简介:
本文探讨了使用基础模型从历史文档中提取图像的方法,特别关注文本-图像提示的有效性。数据集未具体命名,但涉及多个复杂程度不同的人文学科数据集。该方法利用了GroundDINO和Meta的SegmentAnything-Model(SAM)来从历史文档中提取大量视觉数据,用于下游开发任务和数据集创建。数据集的应用领域主要集中在历史研究,特别是历史文档中的视觉元素分析,旨在解决人文学科中缺乏良好注释数据集的问题。
This paper investigates methods for extracting images from historical documents using foundation models, with a particular focus on the effectiveness of text-image prompts. The dataset itself is not explicitly named, but it encompasses multiple humanities datasets with varying levels of complexity. This approach leverages GroundDINO and Meta's Segment Anything Model (SAM) to extract large volumes of visual data from historical documents for downstream development tasks and dataset construction. The application scenarios of this dataset are primarily focused on historical research, especially the analysis of visual elements within historical documents, aiming to address the shortage of well-annotated datasets in the humanities field.
提供机构:
马克斯·普朗克科学史研究所
创建时间:
2023-09-04



