five

stsb

收藏
魔搭社区2026-05-14 更新2024-05-15 收录
下载链接:
https://modelscope.cn/datasets/sentence-transformers/stsb
下载链接
链接失效反馈
官方服务:
资源简介:
# Dataset Card for STSB The Semantic Textual Similarity Benchmark (Cer et al., 2017) is a collection of sentence pairs drawn from news headlines, video and image captions, and natural language inference data. Each pair is human-annotated with a similarity score from 1 to 5. However, for this variant, the similarity scores are normalized to between 0 and 1. ## Dataset Details * Columns: "sentence1", "sentence2", "score" * Column types: `str`, `str`, `float` * Examples: ```python { 'sentence1': 'A man is playing a large flute.', 'sentence2': 'A man is playing a flute.', 'score': 0.76, } ``` * Collection strategy: Reading the sentences and score from STSB dataset and dividing the score by 5. * Deduplified: No

# STSB 数据集卡片 语义文本相似度基准(Semantic Textual Similarity Benchmark,Cer等人,2017)是一组源自新闻标题、视频与图像标题及自然语言推理数据的句子对集合。每一组句子对均由人工标注1至5分的相似度得分。但在此数据集变体中,相似度得分已被归一化至0至1区间。 ## 数据集详情 * 字段:"sentence1"、"sentence2"、"score" * 字段类型:字符串、字符串、浮点数 * 示例: python { 'sentence1': 'A man is playing a large flute.', 'sentence2': 'A man is playing a flute.', 'score': 0.76, } * 数据采集策略:从STSB数据集读取句子与得分,并将得分除以5完成归一化。 * 去重情况:否
提供机构:
maas
创建时间:
2025-01-06
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作