stsb

Name: stsb
Creator: maas
Published: 2026-05-14 17:14:59
License: 暂无描述

魔搭社区2026-05-14 更新2024-05-15 收录

下载链接：

https://modelscope.cn/datasets/sentence-transformers/stsb

下载链接

链接失效反馈

官方服务：

资源简介：

# Dataset Card for STSB The Semantic Textual Similarity Benchmark (Cer et al., 2017) is a collection of sentence pairs drawn from news headlines, video and image captions, and natural language inference data. Each pair is human-annotated with a similarity score from 1 to 5. However, for this variant, the similarity scores are normalized to between 0 and 1. ## Dataset Details * Columns: "sentence1", "sentence2", "score" * Column types: `str`, `str`, `float` * Examples: ```python { 'sentence1': 'A man is playing a large flute.', 'sentence2': 'A man is playing a flute.', 'score': 0.76, } ``` * Collection strategy: Reading the sentences and score from STSB dataset and dividing the score by 5. * Deduplified: No

# STSB 数据集卡片语义文本相似度基准（Semantic Textual Similarity Benchmark，Cer等人，2017）是一组源自新闻标题、视频与图像标题及自然语言推理数据的句子对集合。每一组句子对均由人工标注1至5分的相似度得分。但在此数据集变体中，相似度得分已被归一化至0至1区间。 ## 数据集详情 * 字段："sentence1"、"sentence2"、"score" * 字段类型：字符串、字符串、浮点数 * 示例： python { 'sentence1': 'A man is playing a large flute.', 'sentence2': 'A man is playing a flute.', 'score': 0.76, } * 数据采集策略：从STSB数据集读取句子与得分，并将得分除以5完成归一化。 * 去重情况：否

提供机构：

maas

创建时间：

2025-01-06

搜集汇总

数据集介绍