PIT

arXiv2025-09-30 收录

下载链接：

https://github.com/cocoxu/semeval-pit2015

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集包含18762个句子对，每个句子对的相似度用0到5的整数分数来表示。为了保持数据平衡，该数据集采用了抽样的方法，并使用原始的开发数据进行了测试。具体数据规模为：5332个训练对，350个原始测试对以及1896个开发数据对。该数据集的任务是对释义进行识别。

This dataset contains 18,762 sentence pairs, where the similarity of each pair is annotated with an integer score ranging from 0 to 5. To maintain data balance, sampling methods were adopted, and the original development data was used for testing. The specific data scale is as follows: 5,332 training pairs, 350 original test pairs, and 1,896 development data pairs. The task of this dataset is paraphrase identification.

5,000+

优质数据集

54 个

任务类型

进入经典数据集