plumaj/biographical
收藏Hugging Face2024-03-21 更新2024-06-11 收录
下载链接:
https://hf-mirror.com/datasets/plumaj/biographical
下载链接
链接失效反馈官方服务:
资源简介:
# Biographical Relation Extraction Dataset
Welcome to the repository of datasets tailored for biographical relation extraction, crafted utilizing Guided Distant Supervision (GDS). Explore datasets available in both English and German, which facilitate extensive research in relation extraction from biographical data. Below you can find an overview of the datasets currently available, as well as the relations that are in each set. Please note there are different sets for each language, which denote how they were compiled. In short, normal followed GDS, coref added coreference resolution and skip skipped certain parts of the text. For a more extensive explanation how this worked, please refer to [[1]](#1).
## Available Datasets
### English Dataset
#### Overview
Detailed insights into the English dataset can be found in [[1]](#1).
#### Data Summary
| Relation | Normal Set | Coref Set | Skip Set |
|------------|-------------|-------------|-------------|
| Birthdate | 51,524 | 47,977 | 45,211 |
| Birthplace | 50,226 | 46,551 | 17,537 |
| Deathdate | 17,197 | 14,500 | 5,925 |
| Deathplace | 18,944 | 20,430 | 10,790 |
| Occupation | 18,114 | 18,111 | 8,716 |
| Parent | 6,352 | 10,291 | 5,596 |
| Educated | 5,639 | 9,415 | 3,858 |
| Child | 2,209 | 4,053 | 2,123 |
| Sibling | 2,083 | 3,601 | 1,997 |
| Other | 173,969 | 175,916 | 103,248 |
| **Total** | **346,257** | **350,845** | **205,001** |
### German Dataset
#### Overview
A paper discussing the German dataset is forthcoming [[2]](#2).
#### Data Summary
| Relation | Normal Set | Skip Set |
|------------|------------|------------|
| Birthdate | 8,777 | 770 |
| Birthplace | 12,833 | 5,816 |
| Child | 718 | 701 |
| Deathdate | 922 | 454 |
| Deathplace | 4,059 | 3,263 |
| Educated | 610 | 607 |
| Occupation | 10,861 | 4,836 |
| Other | 39,782 | 20,469 |
| Parent | 3,704 | 3,565 |
| Sibling | 917 | 890 |
| **Total** | **83,183** | **41,380** |
## Additional Information
### How to Use The Datasets
<details>
<summary>Click to expand</summary>
Provide information on how researchers and developers can utilize and reference the datasets in their work.
</details>
### Licensing and Citation
<details>
<summary>Click to expand</summary>
Include licensing details and citation instructions here.
</details>
### Contribution and Feedback
Feel free to contribute or provide feedback to enhance the datasets. Guidelines on how to contribute and provide feedback can be detailed in this section.
## References
<a id="1">[1]</a>
Alistair Plum, Tharindu Ranasinghe, Spencer Jones, Constantin Orasan, Ruslan Mitkov (2022).
Biographical: A Semi-Supervised Relation Extraction Dataset.
Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval.
<a id="2">[2]</a>
Alistair Plum, Tharindu Ranasinghe, Christoph Purschke (2024). Guided Distant Supervision for Multilingual Relation Extraction Data: Adapting to a New Language.
提供机构:
plumaj
原始信息汇总
Biographical Relation Extraction Dataset
概述
本数据集专为生物关系抽取设计,采用引导式远距离监督(GDS)方法构建。数据集提供英语和德语两种语言版本,支持从生物数据中进行关系抽取的广泛研究。
英语数据集
数据概要
| 关系 | 正常集 | 指代消解集 | 跳过集 |
|---|---|---|---|
| Birthdate | 51,524 | 47,977 | 45,211 |
| Birthplace | 50,226 | 46,551 | 17,537 |
| Deathdate | 17,197 | 14,500 | 5,925 |
| Deathplace | 18,944 | 20,430 | 10,790 |
| Occupation | 18,114 | 18,111 | 8,716 |
| Parent | 6,352 | 10,291 | 5,596 |
| Educated | 5,639 | 9,415 | 3,858 |
| Child | 2,209 | 4,053 | 2,123 |
| Sibling | 2,083 | 3,601 | 1,997 |
| Other | 173,969 | 175,916 | 103,248 |
| 总计 | 346,257 | 350,845 | 205,001 |
德语数据集
数据概要
| 关系 | 正常集 | 跳过集 |
|---|---|---|
| Birthdate | 8,777 | 770 |
| Birthplace | 12,833 | 5,816 |
| Child | 718 | 701 |
| Deathdate | 922 | 454 |
| Deathplace | 4,059 | 3,263 |
| Educated | 610 | 607 |
| Occupation | 10,861 | 4,836 |
| Other | 39,782 | 20,469 |
| Parent | 3,704 | 3,565 |
| Sibling | 917 | 890 |
| 总计 | 83,183 | 41,380 |



