five

plumaj/biographical

收藏
Hugging Face2024-03-21 更新2024-06-11 收录
下载链接:
https://hf-mirror.com/datasets/plumaj/biographical
下载链接
链接失效反馈
官方服务:
资源简介:
# Biographical Relation Extraction Dataset Welcome to the repository of datasets tailored for biographical relation extraction, crafted utilizing Guided Distant Supervision (GDS). Explore datasets available in both English and German, which facilitate extensive research in relation extraction from biographical data. Below you can find an overview of the datasets currently available, as well as the relations that are in each set. Please note there are different sets for each language, which denote how they were compiled. In short, normal followed GDS, coref added coreference resolution and skip skipped certain parts of the text. For a more extensive explanation how this worked, please refer to [[1]](#1). ## Available Datasets ### English Dataset #### Overview Detailed insights into the English dataset can be found in [[1]](#1). #### Data Summary | Relation | Normal Set | Coref Set | Skip Set | |------------|-------------|-------------|-------------| | Birthdate | 51,524 | 47,977 | 45,211 | | Birthplace | 50,226 | 46,551 | 17,537 | | Deathdate | 17,197 | 14,500 | 5,925 | | Deathplace | 18,944 | 20,430 | 10,790 | | Occupation | 18,114 | 18,111 | 8,716 | | Parent | 6,352 | 10,291 | 5,596 | | Educated | 5,639 | 9,415 | 3,858 | | Child | 2,209 | 4,053 | 2,123 | | Sibling | 2,083 | 3,601 | 1,997 | | Other | 173,969 | 175,916 | 103,248 | | **Total** | **346,257** | **350,845** | **205,001** | ### German Dataset #### Overview A paper discussing the German dataset is forthcoming [[2]](#2). #### Data Summary | Relation | Normal Set | Skip Set | |------------|------------|------------| | Birthdate | 8,777 | 770 | | Birthplace | 12,833 | 5,816 | | Child | 718 | 701 | | Deathdate | 922 | 454 | | Deathplace | 4,059 | 3,263 | | Educated | 610 | 607 | | Occupation | 10,861 | 4,836 | | Other | 39,782 | 20,469 | | Parent | 3,704 | 3,565 | | Sibling | 917 | 890 | | **Total** | **83,183** | **41,380** | ## Additional Information ### How to Use The Datasets <details> <summary>Click to expand</summary> Provide information on how researchers and developers can utilize and reference the datasets in their work. </details> ### Licensing and Citation <details> <summary>Click to expand</summary> Include licensing details and citation instructions here. </details> ### Contribution and Feedback Feel free to contribute or provide feedback to enhance the datasets. Guidelines on how to contribute and provide feedback can be detailed in this section. ## References <a id="1">[1]</a> Alistair Plum, Tharindu Ranasinghe, Spencer Jones, Constantin Orasan, Ruslan Mitkov (2022). Biographical: A Semi-Supervised Relation Extraction Dataset. Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. <a id="2">[2]</a> Alistair Plum, Tharindu Ranasinghe, Christoph Purschke (2024). Guided Distant Supervision for Multilingual Relation Extraction Data: Adapting to a New Language.
提供机构:
plumaj
原始信息汇总

Biographical Relation Extraction Dataset

概述

本数据集专为生物关系抽取设计,采用引导式远距离监督(GDS)方法构建。数据集提供英语和德语两种语言版本,支持从生物数据中进行关系抽取的广泛研究。

英语数据集

数据概要
关系 正常集 指代消解集 跳过集
Birthdate 51,524 47,977 45,211
Birthplace 50,226 46,551 17,537
Deathdate 17,197 14,500 5,925
Deathplace 18,944 20,430 10,790
Occupation 18,114 18,111 8,716
Parent 6,352 10,291 5,596
Educated 5,639 9,415 3,858
Child 2,209 4,053 2,123
Sibling 2,083 3,601 1,997
Other 173,969 175,916 103,248
总计 346,257 350,845 205,001

德语数据集

数据概要
关系 正常集 跳过集
Birthdate 8,777 770
Birthplace 12,833 5,816
Child 718 701
Deathdate 922 454
Deathplace 4,059 3,263
Educated 610 607
Occupation 10,861 4,836
Other 39,782 20,469
Parent 3,704 3,565
Sibling 917 890
总计 83,183 41,380
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作