akhooli/ar_mmarco_500k_qs
收藏Hugging Face2024-12-07 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/akhooli/ar_mmarco_500k_qs
下载链接
链接失效反馈官方服务:
资源简介:
该数据集基于翻译的MMARCO数据集,包含50万个查询,每个查询与32个文档进行评分,评分基于JinaAI多语言重排器。数据集的特征包括query_id、text、document_ids等,并提供了统计信息如均值、标准差和最大最小值。数据集可用于训练知识蒸馏ColBERT模型,但翻译质量不够好,评分分布不够广泛。
This dataset is based on the translated MMARCO dataset, containing 500k queries and scores for each query against 32 documents. The dataset features include query ID, text, document IDs, scores, means, standard deviations, and maxmins. It is used to train knowledge distillation ColBERT models, but the translation quality is not good enough, and the scores are not spread apart.
提供机构:
akhooli



