Biographical Events Corpus
收藏arXiv2022-06-08 更新2024-06-21 收录
下载链接:
https://github.com/marcostranisci/biographicalEvents
下载链接
链接失效反馈官方服务:
资源简介:
Biographical Events Corpus是由都灵大学计算机科学系创建的数据集,专注于从非西方国家出生、移民或属于少数民族的作家维基百科页面中提取的1000个句子。该数据集通过遵循ISO-TimeML和SemAF标准进行语义标注,旨在解决自动提取传记事件的挑战,特别是针对代表性不足群体的信息。数据集的创建过程涉及从8047个维基百科页面中提取句子,并由四名标注者进行标注,平均标注一致性达到0.825。该数据集的应用领域包括社区检测、人物志研究和社交偏见检测,旨在丰富现有的知识图谱并提高这些任务的性能。
Biographical Events Corpus is a dataset developed by the Department of Computer Science, University of Turin. It focuses on 1,000 sentences extracted from Wikipedia pages of authors born in non-Western countries, immigrants, or members of ethnic minorities. This dataset has been semantically annotated in compliance with the ISO-TimeML and SemAF standards, aiming to address the challenges of automatic biographical event extraction, particularly for information about underrepresented groups. The construction of the dataset involved extracting sentences from 8,047 Wikipedia pages, followed by annotation work conducted by four annotators, with an average inter-annotator agreement score of 0.825. Its application domains include community detection, biographical research, and social bias detection, with the goals of enriching existing knowledge graphs and improving the performance of these tasks.
提供机构:
都灵大学计算机科学系
创建时间:
2022-06-08



