stsb
收藏魔搭社区2026-05-14 更新2024-05-15 收录
下载链接:
https://modelscope.cn/datasets/sentence-transformers/stsb
下载链接
链接失效反馈官方服务:
资源简介:
# Dataset Card for STSB
The Semantic Textual Similarity Benchmark (Cer et al., 2017) is a collection of sentence pairs drawn from news headlines, video and image captions, and natural language inference data.
Each pair is human-annotated with a similarity score from 1 to 5. However, for this variant, the similarity scores are normalized to between 0 and 1.
## Dataset Details
* Columns: "sentence1", "sentence2", "score"
* Column types: `str`, `str`, `float`
* Examples:
```python
{
'sentence1': 'A man is playing a large flute.',
'sentence2': 'A man is playing a flute.',
'score': 0.76,
}
```
* Collection strategy: Reading the sentences and score from STSB dataset and dividing the score by 5.
* Deduplified: No
# STSB 数据集卡片
语义文本相似度基准(Semantic Textual Similarity Benchmark,Cer等人,2017)是一组源自新闻标题、视频与图像标题及自然语言推理数据的句子对集合。每一组句子对均由人工标注1至5分的相似度得分。但在此数据集变体中,相似度得分已被归一化至0至1区间。
## 数据集详情
* 字段:"sentence1"、"sentence2"、"score"
* 字段类型:字符串、字符串、浮点数
* 示例:
python
{
'sentence1': 'A man is playing a large flute.',
'sentence2': 'A man is playing a flute.',
'score': 0.76,
}
* 数据采集策略:从STSB数据集读取句子与得分,并将得分除以5完成归一化。
* 去重情况:否
提供机构:
maas
创建时间:
2025-01-06



