PROPSEGMENT

Name: PROPSEGMENT
Creator: 谷歌研究院
Published: 2023-05-25 07:19:14
License: 暂无描述

arXiv2023-05-25 更新2024-06-21 收录

下载链接：

https://github.com/google-research-datasets/propsegment

下载链接

链接失效反馈

官方服务：

资源简介：

PROPSEGMENT是一个大规模的数据集，由谷歌研究院创建，专注于命题级别的分割和蕴涵识别。该数据集包含超过45000个由专家人工评注的命题，旨在解决自然语言推理任务中的细粒度问题。数据集通过从维基百科和新闻领域抽取主题相关的文档集群构建，每个文档集群中的句子被分割成命题，并标注其与另一文档的蕴涵关系。PROPSEGMENT的应用领域包括文档级别的自然语言推理和摘要幻觉检测，旨在提供更精确的文本蕴涵关系分析和解释。

PROPSEGMENT is a large-scale dataset created by Google Research, focusing on proposition-level segmentation and entailment recognition. It contains over 45,000 expert-annotated propositions, aiming to address fine-grained challenges in natural language inference (NLI) tasks. The dataset is constructed by extracting topic-related document clusters from Wikipedia and news domains. Sentences within each document cluster are segmented into propositions, with their entailment relationships toward another document annotated. Its application areas include document-level natural language inference (NLI) and summarization hallucination detection, aiming to provide more precise textual entailment analysis and explanation.

提供机构：

谷歌研究院

创建时间：

2022-12-21

5,000+

优质数据集

54 个

任务类型

进入经典数据集