rlhn/hn-remove-400K
收藏Hugging Face2025-05-27 更新2025-11-01 收录
下载链接:
https://hf-mirror.com/datasets/rlhn/hn-remove-400K
下载链接
链接失效反馈官方服务:
资源简介:
RLHN是一个级联的大型语言模型框架,旨在准确重标记现有IR/RAG训练数据集中的难负例,例如MS MARCO和HotpotQA。这个Tevatron数据集(400K训练对)包含了BGE训练集合中7个数据集的查询、正例和难负例(删除了假负例)。此存储库包含可用于精调嵌入、ColBERT或多元向量以及重排模型的训练对。
RLHN is a cascading LLM framework designed to accurately relabel hard negatives in existing IR/RAG training datasets, such as MS MARCO and HotpotQA. This Tevatron dataset (400K training pairs) contains the queries, positives, and hard negatives (with dropped false negatives) for 7 datasets in the BGE training collection. This repository contains the training pairs that can be used to fine-tune embedding, ColBERT, or multi-vector reranker models.
提供机构:
rlhn



