racineai/OGC_colpali-VisRAG-vdr
收藏Hugging Face2025-03-23 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/racineai/OGC_colpali-VisRAG-vdr
下载链接
链接失效反馈官方服务:
资源简介:
OGC数据集是一个组织化、分组化、清洁的数据集,它将来自不同来源的数据合并、洗牌和格式化,用于图像/文本到向量的DSE任务。数据集包含超过70万条记录,其中约33%的记录包含否定标签,约25%的记录没有查询(仅包含图像否定)。数据集支持多种语言,包括英语、法语、西班牙语、意大利语和德语,英语占比最高,约为52%。
The OGC dataset is an organized, grouped, and cleaned dataset that merges, shuffles, and formats data from different sources for image/text to vector DSE tasks. The dataset contains over 700,000 records, with approximately 33% of the records containing negative labels and about 25% of the records without queries (image negatives only). The dataset supports multiple languages, including English, French, Spanish, Italian, and German, with English being the most prevalent, accounting for about 52%.
提供机构:
racineai



