lightblue/reranker_continuous_filt_max7_train
收藏Hugging Face2025-01-07 更新2025-02-15 收录
下载链接:
https://hf-mirror.com/datasets/lightblue/reranker_continuous_filt_max7_train
下载链接
链接失效反馈官方服务:
资源简介:
这是一个用于重排训练的数据集,包含来自35个高质量数据集的查询和对应文本数据,覆盖超过95种语言。对于没有负面文本的查询,数据集通过BAAI/bge-m3模型挖掘了硬负样本。每个查询都选定了正负文本,并使用Qwen/Qwen2.5-32B-Instruct-GPTQ-Int4模型对查询-文本对的关联性进行评分。最后,根据评分概率生成了期望值,并将其标准化到1-7的范围内以增加数据的表达性。
This is a reranker training dataset containing queries and corresponding text data from 35 high-quality datasets covering over 95 languages. For queries without negative texts, hard negatives were mined using the BAAI/bge-m3 embedding model. For each query, one positive and one negative text were selected, and the relatedness of each query-text pair was rated using Qwen/Qwen2.5-32B-Instruct-GPTQ-Int4. Finally, expectation values were generated based on the token probabilities and normalized to a scale of 1-7 to add expressivity to the data.
提供机构:
lightblue



