nthakur/bge-retrieval-data
收藏Hugging Face2025-03-12 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/nthakur/bge-retrieval-data
下载链接
链接失效反馈官方服务:
资源简介:
BGE检索数据集包含7个子数据集,大约有678K个训练对。这些数据集主要用于句子相似度任务,支持英文。每个子数据集都有不同的训练对数量,以及正样本和硬负样本的比例。数据集的总大小为43,144,012,642.7字节,下载大小为6,653,360,727字节。
The BGE Retrieval Dataset consists of 7 subsets with approximately 678K training pairs. These datasets are primarily designed for sentence similarity tasks and support English language. Each subset has a different number of training pairs, along with the ratio of positive examples to hard negative examples. The total size of the dataset is 43,144,012,642.7 bytes, with a download size of 6,653,360,727 bytes.
提供机构:
nthakur



