HyperRED
收藏数据集概述
名称: HyperRED
目的: 用于超关系抽取任务,旨在提取关系三元组及其相关限定信息(如时间、数量或位置)。
数据集内容:
- 句子数量: 44,000
- 关系类型: 62
- 限定类型: 44
示例: 关系三元组(Leonard Parker, Educated At, Harvard University)通过添加限定信息(End Time, 1967)进行事实丰富。
数据集结构
- tokens: 句子文本的词元。
- entities: 实体跨度的列表,跨度索引对应于空格分隔文本中的每个词元(包括起始和不包括结束索引)。
- relations: 头实体和尾实体跨度之间的关系标签列表。每个关系包含一个限定列表,每个限定具有值实体跨度和限定标签。
数据示例
json { "tokens": [Acadia, University, ...], "entities": [ {"span": (0, 2), "label": "Entity"}, ... ], "relations": [ { "head": [0, 2], "tail": [9, 13], "label": "headquarters location", "qualifiers": [ {"span": [14, 15], "label": "country"} ] } ] }
模型训练与预测
训练命令: bash python training.py --save_dir ckpt/cube_prune_20_seed_0 --seed 0 --data_dir data/processed --prune_topk 20 --config_file config.yml
预测示例: python from prediction import run_predict
texts = [ "Leonard Parker received his PhD from Harvard University in 1967 .", "Szewczyk played 37 times for Poland, scoring 3 goals .", ] preds = run_predict(texts, path_checkpoint="cube_model")
研究引用
若该代码对您的研究项目有帮助,请引用以下论文:
@inproceedings{chia-etal-2022-hyperred, title = "A Dataset for Hyper-Relational Extraction and a Cube-Filling Approach", author = "Chia, Yew Ken and Bing, Lidong and Aljunied, Sharifah Mahani and Si, Luo and Poria, Soujanya", booktitle = "Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing", year = "2022", url = "https://arxiv.org/abs/2211.10018", }




