FoteiniTag/modified_finepdfs_ell_grek_pages
收藏Hugging Face2026-04-27 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/FoteiniTag/modified_finepdfs_ell_grek_pages
下载链接
链接失效反馈官方服务:
资源简介:
这是一个多模态数据集,包含图像和对话文本数据。每个样本由以下字段组成:图像(image字段)、对话列表(conversations字段,其中每个对话包含发言者from和内容value)、PDF标识符(pdf_id)、页码(page_num)、总页数(num_pages)、来源URL(url)和源行索引(source_row_idx)。数据集仅包含训练拆分(train),共有53个样本,总大小约22.6MB。该数据集可能用于文档理解、视觉问答或基于PDF页面图像的对话生成任务。
This is a multimodal dataset containing image and conversational text data. Each sample consists of the following fields: image, conversations (a list where each conversation includes from and value), pdf_id, page_num, num_pages, url, and source_row_idx. The dataset only includes a train split with 53 examples and a total size of approximately 22.6MB. It is likely designed for document understanding, visual question answering, or dialogue generation tasks based on PDF page images.
提供机构:
FoteiniTag



