five

Chrisyichuan/moca-visrag-ind-training

收藏
Hugging Face2026-04-12 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/Chrisyichuan/moca-visrag-ind-training
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit task_categories: - image-retrieval - question-answering language: - en pretty_name: MOCA VisRAG Independent Training size_categories: - 100K<n<1M --- # Chrisyichuan/moca-visrag-ind-training MOCA VisRAG independent-split contrastive training data with hard negatives. ## Contents - `moca_visrag_ind_converted.jsonl` — query-image pairs with hard negatives - `images/` — all referenced images Each metadata row: ```json { "query": "...", "chunk_path": "images/...", "neg_chunk_paths": ["images/...", "images/..."], "source_positive_rank": 0, "source_positive_score": 0.0, "source_dataset": "moca" } ``` ## Summary - rows: 122752 - unique images: 122752 - avg negatives/row: 2.00 ## Download ```python from huggingface_hub import snapshot_download snapshot_download(repo_id="Chrisyichuan/moca-visrag-ind-training", repo_type="dataset", local_dir="data/moca-visrag-ind-training") ``` ## Image Storage Images are stored as **123 tar shards** under `image_shards/` for fast download. After cloning/downloading, extract images: ```bash python extract_hf_image_shards.py --dataset-dir . ``` This creates `images/` with all referenced image files.
提供机构:
Chrisyichuan
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作