sentence-transformers/msmarco
收藏Hugging Face2026-01-29 更新2025-04-08 收录
下载链接:
https://hf-mirror.com/datasets/sentence-transformers/msmarco
下载链接
链接失效反馈官方服务:
资源简介:
MS MARCO是一个用于信息检索和文本分类的英文数据集,包含多个子集,如triplets, labeled-list, bert-ensemble-mse, bert-ensemble-margin-mse, rankgpt4-colbert, rankzephyr-colbert等。这些子集分别包含查询、文档、标签、分数等信息,可用于训练嵌入或重排序模型。数据集的规模在100M到1B之间,适合于大规模信息检索和文本分类任务。
MS MARCO is an English dataset for information retrieval and text classification, consisting of multiple subsets such as triplets, labeled-list, bert-ensemble-mse, bert-ensemble-margin-mse, rankgpt4-colbert, rankzephyr-colbert, etc. These subsets contain information such as queries, documents, labels, and scores, which can be used to train embedding or reranking models. The dataset is of a large scale, ranging from 100M to 1B, suitable for large-scale information retrieval and text classification tasks.
提供机构:
sentence-transformers



