PROPSEGMENT
收藏arXiv2023-05-25 更新2024-06-21 收录
下载链接:
https://github.com/google-research-datasets/propsegment
下载链接
链接失效反馈官方服务:
资源简介:
PROPSEGMENT是一个大规模的数据集,由谷歌研究院创建,专注于命题级别的分割和蕴涵识别。该数据集包含超过45000个由专家人工评注的命题,旨在解决自然语言推理任务中的细粒度问题。数据集通过从维基百科和新闻领域抽取主题相关的文档集群构建,每个文档集群中的句子被分割成命题,并标注其与另一文档的蕴涵关系。PROPSEGMENT的应用领域包括文档级别的自然语言推理和摘要幻觉检测,旨在提供更精确的文本蕴涵关系分析和解释。
PROPSEGMENT is a large-scale dataset created by Google Research, focusing on proposition-level segmentation and entailment recognition. It contains over 45,000 expert-annotated propositions, aiming to address fine-grained challenges in natural language inference (NLI) tasks. The dataset is constructed by extracting topic-related document clusters from Wikipedia and news domains. Sentences within each document cluster are segmented into propositions, with their entailment relationships toward another document annotated. Its application areas include document-level natural language inference (NLI) and summarization hallucination detection, aiming to provide more precise textual entailment analysis and explanation.
提供机构:
谷歌研究院
创建时间:
2022-12-21



