HyperRED
收藏HyperRED 数据集概述
数据集简介
HyperRED 是一个用于超关系抽取任务的数据集,旨在提取关系三元组及其限定信息,如时间、数量或地点。例如,关系三元组 (Leonard Parker, Educated At, Harvard University) 可以通过包含限定信息 (End Time, 1967) 来丰富事实内容。该数据集包含 44,000 个句子,涉及 62 种关系类型和 44 种限定类型。
数据集下载与处理
数据集可通过以下命令下载和处理: bash python data_process.py download_data data/hyperred/ python data_process.py process_many data/hyperred/ data/processed/
数据探索
以下是数据探索的示例代码: python from data_process import Data
path = "data/hyperred/train.json" data = Data.load(path)
for s in data.sents[:3]: print() print(s.tokens) for r in s.relations: print(r.head, r.label, r.tail) for q in r.qualifiers: print(q.label, q.span)
数据字段
- tokens: 句子文本标记。
- entities: 每个实体范围的列表。范围索引对应于空格分隔文本中的每个标记(包含开始和不包含结束索引)。
- relations: 头部和尾部实体范围之间的关系标签列表。每个关系包含一个限定列表,每个限定具有值实体范围和限定标签。
数据示例
以下是数据集的一个示例实例: json { "tokens": ["Acadia", "University", "is", "a", "predominantly", "undergraduate", "university", "located", "in", "Wolfville", ",", "Nova", "Scotia", ",", "Canada", "with", "some", "graduate", "programs", "at", "the", "master", "", "s", "level", "and", "one", "at", "the", "doctoral", "level", "."], "entities": [ {"span": (0, 2), "label": "Entity"}, {"span": (9, 13), "label": "Entity"}, {"span": (14, 15), "label": "Entity"}, ], "relations": [ { "head": [0, 2], "tail": [9, 13], "label": "headquarters location", "qualifiers": [ {"span": [14, 15], "label": "country"} ] } ], }




