five

sentence-transformers/msmarco-distilbert-margin-mse-sym-mnrl-mean-v1

收藏
Hugging Face2024-05-15 更新2024-06-15 收录
下载链接:
https://hf-mirror.com/datasets/sentence-transformers/msmarco-distilbert-margin-mse-sym-mnrl-mean-v1
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集是基于MS MARCO信息检索语料库构建的,主要用于训练句子转换模型(Sentence Transformer models)。数据集包含多个子集,每个子集都包含查询(query)、正面段落(positive)和负面段落(negative)的配对。这些负面段落是通过不同的模型从与查询最相似的段落中挖掘出来的。数据集提供了多种配置,包括字符串形式的文本数据和ID形式的数据,以便与MS MARCO语料库一起使用。数据集的主要用途是用于特征提取和句子相似性任务。
提供机构:
sentence-transformers
原始信息汇总

数据集概述

基本信息

  • 语言: 英语
  • 多语言性: 单语
  • 数据集大小: 10M < n < 100M
  • 任务类别: 特征提取、句子相似度
  • 标签: sentence-transformers
  • 数据集名称: MS MARCO with hard negatives from distilbert-margin-mse-sym-mnrl-mean-v1

数据集配置

triplet 配置

  • 特征:
    • query: string
    • positive: string
    • negative: string
  • 分割:
    • train:
      • 字节数: 362242689
      • 样本数: 502939
  • 下载大小: 237710178
  • 数据集大小: 362242689

triplet-50 配置

  • 特征:
    • query: string
    • positive: string
    • negative_1negative_50: string
  • 分割:
    • train:
      • 字节数: 9056441801
      • 样本数: 502939
  • 下载大小: 5928790155
  • 数据集大小: 9056441801

triplet-50-ids 配置

  • 特征:
    • query: int64
    • positive: int64
    • negative_1negative_50: int64
  • 分割:
    • train:
      • 字节数: 209222624
      • 样本数: 502939
  • 下载大小: 178199029
  • 数据集大小: 209222624

triplet-all 配置

  • 特征:
    • query: string
    • positive: string
    • negative: string
  • 分割:
    • train:
      • 字节数: 19861034624
      • 样本数: 26637550
  • 下载大小: 4303477651
  • 数据集大小: 19861034624

triplet-all-ids 配置

  • 特征:
    • query: int64
    • positive: int64
    • negative: int64
  • 分割:
    • train:
      • 字节数: 639301200
      • 样本数: 26637550
  • 下载大小: 190490947
  • 数据集大小: 639301200

triplet-hard 配置

  • 特征:
    • query: string
    • positive: string
    • negative: string
  • 分割:
    • train:
      • 字节数: 8832950905
      • 样本数: 12127139
  • 下载大小: 2268035061
  • 数据集大小: 8832950905

triplet-hard-ids 配置

  • 特征:
    • query: int64
    • positive: int64
    • negative: int64
  • 分割:
    • train:
      • 字节数: 291051336
      • 样本数: 12127139
  • 下载大小: 93192817
  • 数据集大小: 291051336

triplet-ids 配置

  • 特征:
    • query: int64
    • positive: int64
    • negative: int64
  • 分割:
    • train:
      • 字节数: 12070536
      • 样本数: 502939
  • 下载大小: 10132059
  • 数据集大小: 12070536

数据集子集

triplet 子集

  • : "query", "positive", "negative"

  • 列类型: str, str, str

  • 示例: python { "query": "what are the liberal arts?", "positive": liberal arts. 1. the academic course of instruction at a college intended to provide general knowledge and comprising the arts, humanities, natural sciences, and social sciences, as opposed to professional or technical subjects., "negative": The New York State Education Department requires 60 Liberal Arts credits in a Bachelor of Science program and 90 Liberal Arts credits in a Bachelor of Arts program. In the list of course descriptions, courses which are liberal arts for all students are identified by (Liberal Arts) after the course number. }

  • 去重: 否

triplet-ids 子集

  • : "query", "positive", "negative"

  • 列类型: int, int, int

  • 示例: python { "query": 571018, "positive": 7349777, "negative": 6948601 }

  • 去重: 否

triplet-all 子集

  • : "query", "positive", "negative"

  • 列类型: str, str, str

  • 示例: python { "query": "what are the liberal arts?", "positive": liberal arts. 1. the academic course of instruction at a college intended to provide general knowledge and comprising the arts, humanities, natural sciences, and social sciences, as opposed to professional or technical subjects., "negative": The New York State Education Department requires 60 Liberal Arts credits in a Bachelor of Science program and 90 Liberal Arts credits in a Bachelor of Arts program. In the list of course descriptions, courses which are liberal arts for all students are identified by (Liberal Arts) after the course number. }

  • 去重: 否

triplet-all-ids 子集

  • : "query", "positive", "negative"

  • 列类型: int, int, int

  • 示例: python { "query": 571018, "positive": 7349777, "negative": 6948601 }

  • 去重: 否

triplet-hard 子集

  • : "query", "positive", "negative"

  • 列类型: str, str, str

  • 示例: python { "query": "what are the liberal arts?", "positive": liberal arts. 1. the academic course of instruction at a college intended to provide general knowledge and comprising the arts, humanities, natural sciences, and social sciences, as opposed to professional or technical subjects., "negative": The New York State Education Department requires 60 Liberal Arts credits in a Bachelor of Science program and 90 Liberal Arts credits in a Bachelor of Arts program. In the list of course descriptions, courses which are liberal arts for all students are identified by (Liberal Arts) after the course number. }

  • 去重: 否

triplet-hard-ids 子集

  • : "query", "positive", "negative"

  • 列类型: int, int, int

  • 示例: python { "query": 571018, "positive": 7349777, "negative": 6948601 }

  • 去重: 否

triplet-50 子集

  • : "query", "positive", negative_1 至 negative_50

  • 列类型: str, str, str 重复50次

  • 示例: python { "query": "what are the liberal arts?", "positive": "liberal arts. 1. the academic course of instruction at a college intended to provide general knowledge and comprising the arts, humanities, natural sciences, and social sciences, as opposed to professional or technical subjects.", "negative_1": "The New York State Education Department requires 60 Liberal Arts credits in a Bachelor of Science program and 90 Liberal Arts credits in a Bachelor of Arts program. In the list of course descriptions, courses which are liberal arts for all students are identified by (Liberal Arts) after the course number.", "negative_2": "What Does it Mean to Study Liberal Arts? A liberal arts major offers a broad overview of the arts, sciences, and humanities. Within the context of a liberal arts degree, you can study modern languages, music, English, anthropology, history, womens studies, psychology, math, political science or many other disciplines.", "negative_3": "What Is Liberal Studies? Liberal studies, also known as liberal arts, comprises a broad exploration of social sciences, natural sciences, humanities, and the arts. If you are interested in a wide-ranging education in humanities, communication, and thinking, read on to find out about the educational and career possibilities in liberal studies.", "negative_4": "You can choose from an array of liberal arts majors. Most of these are offered in the liberal arts departments of colleges that belong to universities and at smaller colleges that are designated as liberal arts institutions.", "negative_5": "Majors. You can choose from an array of liberal arts majors. Most of these are offered in the liberal arts departments of colleges that belong to universities and at smaller colleges that are designated as liberal arts institutions.", "negative_6": "liberal arts. plural noun. Definition of liberal arts for English Language Learners. : areas of study (such as history, language, and literature) that are intended to give you general knowledge rather than to develop specific skills needed for a profession. Nglish: Translation of liberal arts for Spanish speakers Britannica.com: Encyclopedia article about liberal arts.", "negative_7": "Because they award less than 50% of their degrees in engineering, and the rest in liberal arts (sciences). Baccalaureate colleges are a type of Liberal Arts colleges, But offering lesser number of degrees compared to LAC. Its the other way round. A liberal arts college focuses on liberal arts, e.g. sciences, literature, history, sociology, etc. They might offer a few professional degrees (most frequently engineering) as well, but typically the professional majors are well integrated into the liberal arts framework as well.", "negative_8": "A liberal arts college is a four-year institution that focuses on the study of liberal arts. Liberal arts colleges are geared more toward the acquisition of knowledge and less toward specific professions. [MORE: The Path to Higher Education] Graduate school.", "negative_9": "1 BA = Bachelor of Arts degree BS = Bachelor of Science degree. 2 I think the question requires more of an explanation than what the terms BA and BS translate to. 3 B.A. (Bachelor of Arts) A bachelor of arts (B.A.) degree is what is generally called a liberal arts degree. I think the question requires more of an explanation than what the terms BA and BS translate to. 2 B.A. (Bachelor of Arts) A bachelor of arts (B.A.) degree is what is generally called a liberal arts degree.", "negative_10": "West Hills College LemooreAssociate of Arts (A.A.), Liberal Arts and Sciences/Liberal StudiesAssociate of Arts (A.A.), Liberal Arts and Sciences/Liberal Studies. -Student Government President for two years. -Valedictorian. -Alpha Gamma Sigma (Alpha Chi chapter) President/College Relations Liaison.", "negative_11": "You can pursue associate degree in academic area such as business administration, law, arts, engineering, paralegal studies, liberal arts, computer science, and more. Q: What are online associate programs?", "negative_12": "liberal arts definition The areas of learning that cultivate general intellectual ability rather than technical or professional skills. Liberal arts is often used as a synonym for humanities, because literature, languag

  • 去重: 否

triplet-50-ids 子集

  • : "query", "positive", negative_1 至 negative_50

  • 列类型: int, int, int 重复50次

  • 示例: python { "query": 571018, "positive": 7349777, "negative_1": 6948601, "negative_2": 6948602, "negative_3": 6948603, "negative_4": 6948604, "negative_5": 6948605, "negative_6": 6948606, "negative_7": 6948607, "negative_8": 6948608, "negative_9": 6948609, "negative_10": 6948610, "negative_11": 6948611, "negative_12": 6948612, "negative_13": 6948613, "negative_14": 6948614, "negative_15": 6948615, "negative_16": 6948616, "negative_17": 6948617, "negative_18": 6948618, "negative_19": 6948619, "negative_20": 6948620, "negative_21": 6948621, "negative_22": 6948622, "negative_23": 6948623, "negative_24": 6948624, "negative_25": 6948625, "negative_26": 6948626, "negative_27": 6948627, "negative_28": 6948628, "negative_29": 6948629, "negative_30": 6948630, "negative_31": 6948631, "negative_32": 6948632, "negative_33": 6948633, "negative_34": 6948634, "negative_35": 6948635, "negative_36": 6948636, "negative_37": 6948637, "negative_38": 6948638, "negative_39": 6948639, "negative_40": 6948640, "negative_41": 6948641, "negative_42": 6948642, "negative_43": 6948643, "negative_44": 6948644, "negative_45": 6948645, "negative_46": 6948646, "negative_47": 6948647, "negative_48": 6948648, "negative_49": 6948649, "negative_50": 6948650 }

  • 去重: 否

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作