Synthetic Data for Training VisRAG
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/openbmb/visrag
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是为了训练VisRAG系统中的检索器而收集的合成数据,它使得模型能够在多模态文档上进行检索和生成任务。此外,该数据集还用于训练如MiniCPM-V 2.0和MiniCPM-V 2.6等模型,并且是展示VisRAG有效性实验的一部分。其任务主要聚焦于增强型检索生成。
This synthetic dataset was collected to train the retriever component of the VisRAG system, enabling models to perform retrieval and generation tasks on multimodal documents. Additionally, this dataset is also utilized for training models such as MiniCPM-V 2.0 and MiniCPM-V 2.6, and serves as part of the experiments demonstrating the effectiveness of VisRAG. Its tasks primarily focus on enhanced retrieval-augmented generation.
提供机构:
OpenBMB



