paws
收藏OpenXLab2026-04-18 收录
下载链接:
https://openxlab.org.cn/datasets/OpenDataLab/paws
下载链接
链接失效反馈官方服务:
资源简介:
PAWS: Paraphrase Adversaries from Word Scrambling
This dataset contains 108,463 human-labeled and 656k noisily labeled pairs that feature the importance of modeling structure, context, and word order information for the problem of paraphrase identification. The dataset has two subsets, one based on Wikipedia and the other one based on the Quora Question Pairs (QQP) dataset.
For further details, see the accompanying paper: PAWS: Paraphrase Adversaries from Word Scrambling (https://arxiv.org/abs/1904.01130)
PAWS-QQP is not available due to license of QQP. It must be reconstructed by downloading the original data and then running our scripts to produce the data and attach the labels.
PAWS数据集:基于单词打乱构造的对抗式释义对(Paraphrase Adversaries from Word Scrambling)。本数据集包含108,463条人工标注样本对与65.6万条带噪声标注样本对,其凸显了在释义识别任务中建模句法结构、上下文语义与词序信息的重要性。该数据集包含两个子集:一个基于维基百科(Wikipedia)语料,另一个基于Quora问题对数据集(Quora Question Pairs,简称QQP)。如需了解更多细节,请参阅配套论文:《PAWS:基于单词打乱构造的对抗式释义对》(PAWS: Paraphrase Adversaries from Word Scrambling,https://arxiv.org/abs/1904.01130)。受限于QQP数据集的授权协议,PAWS-QQP子集暂未对外发布。用户需自行下载原始语料,运行配套脚本生成数据并添加标注,以此构建该子集。
提供机构:
OpenDataLab
创建时间:
2022-08-16



