ROBUST
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/qijimrc/ROBUST
下载链接
链接失效反馈官方服务:
资源简介:
该数据集名为ROBUST,它旨在模拟在现实场景中评估开放信息提取模型的性能,重点关注那些包含具有相同意义但以不同句法和表达形式呈现的结构化知识句子的知识不变团。数据集引入了一种名为“团”的新型数据结构,用以建立具有潜在知识联系的句子之间的关联,并旨在减轻先前基准测试中存在的分布偏差问题。该数据集规模庞大,包含了大量由人工标注的句子,平均每个团包含3.877个句子。其任务是开放信息提取。
This dataset, named ROBUST, is developed to simulate real-world scenarios for evaluating the performance of open information extraction (OIE) models, with a core focus on knowledge-invariant clusters composed of sentences that convey identical structured knowledge but appear in diverse syntactic and expressive forms. The dataset introduces a novel data structure termed "cluster" to establish associations between sentences with potential knowledge links, aiming to mitigate the distribution bias issues present in prior benchmark datasets. This large-scale dataset contains a vast number of manually annotated sentences, with an average of 3.877 sentences per cluster. The task targeted by this dataset is open information extraction.



