five

racineai/VDR_colpali-VisRAG-vdr

收藏
Hugging Face2025-11-20 更新2026-01-03 收录
下载链接:
https://hf-mirror.com/datasets/racineai/VDR_colpali-VisRAG-vdr
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 language: - en - fr - es - it - de tags: - synthetic - RAG - DSE - retrieval size_categories: - 100K<n<1M task_categories: - visual-document-retrieval - text-retrieval --- # WIP - there might be issues with the negatives # VDR - Organized, Grouped, Cleaned > **Intended for image/text to vector (DSE)** ## Dataset Composition The dataset merges, shuffles, and formats data from: - [vidore/colpali_train_set](https://huggingface.co/datasets/vidore/colpali_train_set) - [openbmb/VisRAG-Ret-Train-Synthetic-data](https://huggingface.co/datasets/openbmb/VisRAG-Ret-Train-Synthetic-data) - [llamaindex/vdr-multilingual-train](https://huggingface.co/datasets/llamaindex/vdr-multilingual-train) ## Dataset Statistics | Metric | Value | |--------|-------| | Total rows | 700,000+ | | Rows with negatives | ≈ 33% | | Rows without queries (image negatives only) | ≈ 25% | ## Language Distribution | Language| Ratio | |--------|-------| | English | ≈ 52% | | French | ≈ 12% | | Spanish | ≈ 12% | | Italian | ≈ 12% | | German | ≈ 12% | ## Creators Dataset curated by: - **Paul Lemaistre** - **Léo Appourchaux**
提供机构:
racineai
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作