sentence-transformers/nli-for-simcse
收藏数据集卡片 NLI for SimCSE
数据集概述
- 语言: 英语
- 多语言性: 单语种
- 数据集大小: 1M < n < 10M
- 任务类别: 特征提取, 句子相似度
- 标签: sentence-transformers
数据集配置
triplet 子集
- 特征:
anchor: 字符串positive: 字符串negative: 字符串
- 分割:
train:- 字节数: 51033641
- 样本数: 274951
- 下载大小: 33517191
- 数据集大小: 51033641
triplet-7 子集
- 特征:
anchor: 字符串positive: 字符串negative_1至negative_7: 字符串
- 分割:
train:- 字节数: 129065964
- 样本数: 273540
- 下载大小: 87886620
- 数据集大小: 129065964
triplet-all 子集
- 特征:
anchor: 字符串positive: 字符串negative: 字符串
- 分割:
train:- 字节数: 357145333
- 样本数: 1925996
- 下载大小: 94616052
- 数据集大小: 357145333
数据集子集
triplet 子集
-
列: "anchor", "positive", "negative"
-
列类型:
str,str,str -
示例: python { anchor: One of our number will carry out your instructions minutely., positive: A member of my team will execute your orders with immense precision., negative: We have no one free at the moment so you have to take action yourself. }
-
收集策略: 从
en_NLI_data目录中读取 jsonl 文件,仅取第一个 negative。 -
去重: 否
triplet-7 子集
-
列: "anchor", "positive", "negative_1", "negative_2", "negative_3", "negative_4", "negative_5", "negative_6", "negative_7"
-
列类型:
str,str,str,str,str,str,str -
示例: python { anchor: One of our number will carry out your instructions minutely., positive: A member of my team will execute your orders with immense precision., negative_1: We have no one free at the moment so you have to take action yourself., negative_2: A poodle is running through the grass., negative_3: Investment and planning are growing industries in Jamaica., negative_4: A bearded man is rocking out on an acoustic guitar, negative_5: The people are sunbathing on the beach., negative_6: A construction worker installs a door., negative_7: A crowd has gathered because of a dangerous situation. }
-
收集策略: 从
en_NLI_data目录中读取 jsonl 文件,取所有包含 7 个 negatives 的样本。 -
去重: 否
triplet-all 子集
-
列: "anchor", "positive", "negative"
-
列类型:
str,str,str -
示例: python { anchor: One of our number will carry out your instructions minutely., positive: A member of my team will execute your orders with immense precision., negative: We have no one free at the moment so you have to take action yourself. }
-
收集策略: 从
en_NLI_data目录中读取 jsonl 文件,取每个 negative,并为每个 negative 生成一个单独的样本。 -
去重: 否




