sentence-transformers/msmarco-distilbert-margin-mse-sym-mnrl-mean-v1
收藏数据集概述
基本信息
- 语言: 英语
- 多语言性: 单语
- 数据集大小: 10M < n < 100M
- 任务类别: 特征提取、句子相似度
- 标签: sentence-transformers
- 数据集名称: MS MARCO with hard negatives from distilbert-margin-mse-sym-mnrl-mean-v1
数据集配置
triplet 配置
- 特征:
query: stringpositive: stringnegative: string
- 分割:
train:- 字节数: 362242689
- 样本数: 502939
- 下载大小: 237710178
- 数据集大小: 362242689
triplet-50 配置
- 特征:
query: stringpositive: stringnegative_1至negative_50: string
- 分割:
train:- 字节数: 9056441801
- 样本数: 502939
- 下载大小: 5928790155
- 数据集大小: 9056441801
triplet-50-ids 配置
- 特征:
query: int64positive: int64negative_1至negative_50: int64
- 分割:
train:- 字节数: 209222624
- 样本数: 502939
- 下载大小: 178199029
- 数据集大小: 209222624
triplet-all 配置
- 特征:
query: stringpositive: stringnegative: string
- 分割:
train:- 字节数: 19861034624
- 样本数: 26637550
- 下载大小: 4303477651
- 数据集大小: 19861034624
triplet-all-ids 配置
- 特征:
query: int64positive: int64negative: int64
- 分割:
train:- 字节数: 639301200
- 样本数: 26637550
- 下载大小: 190490947
- 数据集大小: 639301200
triplet-hard 配置
- 特征:
query: stringpositive: stringnegative: string
- 分割:
train:- 字节数: 8832950905
- 样本数: 12127139
- 下载大小: 2268035061
- 数据集大小: 8832950905
triplet-hard-ids 配置
- 特征:
query: int64positive: int64negative: int64
- 分割:
train:- 字节数: 291051336
- 样本数: 12127139
- 下载大小: 93192817
- 数据集大小: 291051336
triplet-ids 配置
- 特征:
query: int64positive: int64negative: int64
- 分割:
train:- 字节数: 12070536
- 样本数: 502939
- 下载大小: 10132059
- 数据集大小: 12070536
数据集子集
triplet 子集
-
列: "query", "positive", "negative"
-
列类型:
str,str,str -
示例: python { "query": "what are the liberal arts?", "positive": liberal arts. 1. the academic course of instruction at a college intended to provide general knowledge and comprising the arts, humanities, natural sciences, and social sciences, as opposed to professional or technical subjects., "negative": The New York State Education Department requires 60 Liberal Arts credits in a Bachelor of Science program and 90 Liberal Arts credits in a Bachelor of Arts program. In the list of course descriptions, courses which are liberal arts for all students are identified by (Liberal Arts) after the course number. }
-
去重: 否
triplet-ids 子集
-
列: "query", "positive", "negative"
-
列类型:
int,int,int -
示例: python { "query": 571018, "positive": 7349777, "negative": 6948601 }
-
去重: 否
triplet-all 子集
-
列: "query", "positive", "negative"
-
列类型:
str,str,str -
示例: python { "query": "what are the liberal arts?", "positive": liberal arts. 1. the academic course of instruction at a college intended to provide general knowledge and comprising the arts, humanities, natural sciences, and social sciences, as opposed to professional or technical subjects., "negative": The New York State Education Department requires 60 Liberal Arts credits in a Bachelor of Science program and 90 Liberal Arts credits in a Bachelor of Arts program. In the list of course descriptions, courses which are liberal arts for all students are identified by (Liberal Arts) after the course number. }
-
去重: 否
triplet-all-ids 子集
-
列: "query", "positive", "negative"
-
列类型:
int,int,int -
示例: python { "query": 571018, "positive": 7349777, "negative": 6948601 }
-
去重: 否
triplet-hard 子集
-
列: "query", "positive", "negative"
-
列类型:
str,str,str -
示例: python { "query": "what are the liberal arts?", "positive": liberal arts. 1. the academic course of instruction at a college intended to provide general knowledge and comprising the arts, humanities, natural sciences, and social sciences, as opposed to professional or technical subjects., "negative": The New York State Education Department requires 60 Liberal Arts credits in a Bachelor of Science program and 90 Liberal Arts credits in a Bachelor of Arts program. In the list of course descriptions, courses which are liberal arts for all students are identified by (Liberal Arts) after the course number. }
-
去重: 否
triplet-hard-ids 子集
-
列: "query", "positive", "negative"
-
列类型:
int,int,int -
示例: python { "query": 571018, "positive": 7349777, "negative": 6948601 }
-
去重: 否
triplet-50 子集
-
列: "query", "positive", negative_1 至 negative_50
-
列类型:
str,str,str重复50次 -
示例: python { "query": "what are the liberal arts?", "positive": "liberal arts. 1. the academic course of instruction at a college intended to provide general knowledge and comprising the arts, humanities, natural sciences, and social sciences, as opposed to professional or technical subjects.", "negative_1": "The New York State Education Department requires 60 Liberal Arts credits in a Bachelor of Science program and 90 Liberal Arts credits in a Bachelor of Arts program. In the list of course descriptions, courses which are liberal arts for all students are identified by (Liberal Arts) after the course number.", "negative_2": "What Does it Mean to Study Liberal Arts? A liberal arts major offers a broad overview of the arts, sciences, and humanities. Within the context of a liberal arts degree, you can study modern languages, music, English, anthropology, history, womens studies, psychology, math, political science or many other disciplines.", "negative_3": "What Is Liberal Studies? Liberal studies, also known as liberal arts, comprises a broad exploration of social sciences, natural sciences, humanities, and the arts. If you are interested in a wide-ranging education in humanities, communication, and thinking, read on to find out about the educational and career possibilities in liberal studies.", "negative_4": "You can choose from an array of liberal arts majors. Most of these are offered in the liberal arts departments of colleges that belong to universities and at smaller colleges that are designated as liberal arts institutions.", "negative_5": "Majors. You can choose from an array of liberal arts majors. Most of these are offered in the liberal arts departments of colleges that belong to universities and at smaller colleges that are designated as liberal arts institutions.", "negative_6": "liberal arts. plural noun. Definition of liberal arts for English Language Learners. : areas of study (such as history, language, and literature) that are intended to give you general knowledge rather than to develop specific skills needed for a profession. Nglish: Translation of liberal arts for Spanish speakers Britannica.com: Encyclopedia article about liberal arts.", "negative_7": "Because they award less than 50% of their degrees in engineering, and the rest in liberal arts (sciences). Baccalaureate colleges are a type of Liberal Arts colleges, But offering lesser number of degrees compared to LAC. Its the other way round. A liberal arts college focuses on liberal arts, e.g. sciences, literature, history, sociology, etc. They might offer a few professional degrees (most frequently engineering) as well, but typically the professional majors are well integrated into the liberal arts framework as well.", "negative_8": "A liberal arts college is a four-year institution that focuses on the study of liberal arts. Liberal arts colleges are geared more toward the acquisition of knowledge and less toward specific professions. [MORE: The Path to Higher Education] Graduate school.", "negative_9": "1 BA = Bachelor of Arts degree BS = Bachelor of Science degree. 2 I think the question requires more of an explanation than what the terms BA and BS translate to. 3 B.A. (Bachelor of Arts) A bachelor of arts (B.A.) degree is what is generally called a liberal arts degree. I think the question requires more of an explanation than what the terms BA and BS translate to. 2 B.A. (Bachelor of Arts) A bachelor of arts (B.A.) degree is what is generally called a liberal arts degree.", "negative_10": "West Hills College LemooreAssociate of Arts (A.A.), Liberal Arts and Sciences/Liberal StudiesAssociate of Arts (A.A.), Liberal Arts and Sciences/Liberal Studies. -Student Government President for two years. -Valedictorian. -Alpha Gamma Sigma (Alpha Chi chapter) President/College Relations Liaison.", "negative_11": "You can pursue associate degree in academic area such as business administration, law, arts, engineering, paralegal studies, liberal arts, computer science, and more. Q: What are online associate programs?", "negative_12": "liberal arts definition The areas of learning that cultivate general intellectual ability rather than technical or professional skills. Liberal arts is often used as a synonym for humanities, because literature, languag
-
去重: 否
triplet-50-ids 子集
-
列: "query", "positive", negative_1 至 negative_50
-
列类型:
int,int,int重复50次 -
示例: python { "query": 571018, "positive": 7349777, "negative_1": 6948601, "negative_2": 6948602, "negative_3": 6948603, "negative_4": 6948604, "negative_5": 6948605, "negative_6": 6948606, "negative_7": 6948607, "negative_8": 6948608, "negative_9": 6948609, "negative_10": 6948610, "negative_11": 6948611, "negative_12": 6948612, "negative_13": 6948613, "negative_14": 6948614, "negative_15": 6948615, "negative_16": 6948616, "negative_17": 6948617, "negative_18": 6948618, "negative_19": 6948619, "negative_20": 6948620, "negative_21": 6948621, "negative_22": 6948622, "negative_23": 6948623, "negative_24": 6948624, "negative_25": 6948625, "negative_26": 6948626, "negative_27": 6948627, "negative_28": 6948628, "negative_29": 6948629, "negative_30": 6948630, "negative_31": 6948631, "negative_32": 6948632, "negative_33": 6948633, "negative_34": 6948634, "negative_35": 6948635, "negative_36": 6948636, "negative_37": 6948637, "negative_38": 6948638, "negative_39": 6948639, "negative_40": 6948640, "negative_41": 6948641, "negative_42": 6948642, "negative_43": 6948643, "negative_44": 6948644, "negative_45": 6948645, "negative_46": 6948646, "negative_47": 6948647, "negative_48": 6948648, "negative_49": 6948649, "negative_50": 6948650 }
-
去重: 否



