Chrisyichuan/screenshot-training-naive-top2-hn-ablation
收藏Hugging Face2026-04-22 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/Chrisyichuan/screenshot-training-naive-top2-hn-ablation
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是`Chrisyichuan/screenshot-training-natural-filtered-v2`的一个变体,用于研究不同负例选择方法对模型性能的影响。与原始数据集不同,该数据集未使用Gemini VLM过滤器来筛选负例,而是直接从检索结果中选取前两个非正例作为硬负例。数据集包含训练、评估和测试三个部分,每部分数据都包含查询、正例路径和两个硬负例路径。图像以分片形式存储,以减少文件数量。该数据集适用于图像检索和问答任务,旨在帮助研究者评估不同负例选择策略的效果。
This dataset is an ablation variant of `Chrisyichuan/screenshot-training-natural-filtered-v2`, designed to study the impact of different negative selection methods on model performance. Unlike the original dataset, this variant skips the Gemini VLM judge filter for false negatives and instead keeps the first two non-positive hits from the top-10 retrieved results as hard negatives. The dataset includes train, eval, and test splits, each containing queries, positive chunk paths, and two hard negative chunk paths. Images are stored in sharded tar files to manage file count. It is suitable for image retrieval and question-answering tasks, aiming to help researchers evaluate the effectiveness of different negative selection strategies.
提供机构:
Chrisyichuan



