sentence-transformers/stsb
收藏Hugging Face2024-04-25 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/sentence-transformers/stsb
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
multilinguality:
- monolingual
size_categories:
- 1K<n<10K
task_categories:
- feature-extraction
- sentence-similarity
tags:
- sentence-transformers
pretty_name: STSB
dataset_info:
features:
- name: sentence1
dtype: string
- name: sentence2
dtype: string
- name: score
dtype: float64
splits:
- name: train
num_bytes: 755098
num_examples: 5749
- name: validation
num_bytes: 216064
num_examples: 1500
- name: test
num_bytes: 169987
num_examples: 1379
download_size: 720899
dataset_size: 1141149
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
- split: validation
path: data/validation-*
- split: test
path: data/test-*
---
# Dataset Card for STSB
The Semantic Textual Similarity Benchmark (Cer et al., 2017) is a collection of sentence pairs drawn from news headlines, video and image captions, and natural language inference data.
Each pair is human-annotated with a similarity score from 1 to 5. However, for this variant, the similarity scores are normalized to between 0 and 1.
## Dataset Details
* Columns: "sentence1", "sentence2", "score"
* Column types: `str`, `str`, `float`
* Examples:
```python
{
'sentence1': 'A man is playing a large flute.',
'sentence2': 'A man is playing a flute.',
'score': 0.76,
}
```
* Collection strategy: Reading the sentences and score from STSB dataset and dividing the score by 5.
* Deduplified: No
提供机构:
sentence-transformers
原始信息汇总
数据集概述
基本信息
- 名称: STSB
- 语言: 英语
- 多语言性: 单语种
- 规模: 1K<n<10K
- 任务类别:
- 特征提取
- 句子相似度
- 标签: sentence-transformers
数据集结构
特征
- sentence1: 字符串类型
- sentence2: 字符串类型
- score: 浮点数类型(float64)
数据分割
- 训练集:
- 示例数量: 5749
- 字节数: 755098
- 验证集:
- 示例数量: 1500
- 字节数: 216064
- 测试集:
- 示例数量: 1379
- 字节数: 169987
数据大小
- 下载大小: 720899字节
- 数据集大小: 1141149字节
数据内容
- 数据组成: 句子对及相似度分数
- 分数范围: 0到1(原范围1到5,已归一化)
- 示例: python { sentence1: A man is playing a large flute., sentence2: A man is playing a flute., score: 0.76, }



