nicolafan/wikifragments
收藏Hugging Face2025-08-08 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/nicolafan/wikifragments
下载链接
链接失效反馈官方服务:
资源简介:
WikiFragments是一个多模态数据集,由英文维基百科构建而成,包含经过清洗的文本段落及其相关图像(信息框和缩略图)。每个段落与图像对形成一个多模态片段,可作为信息检索和多模态研究的基本知识单元。该数据集旨在用于信息检索、视觉文档检索、句子相似性和视觉问答等任务。
WikiFragments is a multimodal dataset built from Wikipedia (en), consisting of cleaned textual paragraphs paired with related images (infobox and thumbnail) from the same page. Each pair forms a multimodal fragment, serving as an atomic knowledge unit ideal for information retrieval and multimodal research. The dataset is designed for use in retrieval tasks, particularly in retrieval-augmented generation (RAG), to provide relevant multimodal context for answering questions.
提供机构:
nicolafan



