Feature Enriched Dataset

Name: Feature Enriched Dataset
Creator: Humaira Waqas
Published: 2021-02-13 00:00:00
License: 暂无描述

Figshare2021-02-13 更新2026-04-08 收录

下载链接：

https://figshare.com/articles/dataset/Feature_Enriched_Dataset_to_Develop_and_Test_Author_Name_Ambiguity_Solutions/13651502/3

下载链接

链接失效反馈

官方服务：

资源简介：

Author name ambiguity problem stems from the fact that many individuals share identical names in the real world. It is a non-trivial problem, commonly faced by digital libraries who index and retrieve scholarly articles written by distinct authors, on demand. The problem occurs when multiple authors share a common name or when multiple name variations of an author appear in the publication records, as, digital libraries do not provide an efficient way to accurately identify publications of an author. Furthermore, lack of complete meta data information of an author and publication hinders the development of a generic algorithm that can solve the under discussing problem. Many data collections exists which are used to develop and test author name ambiguity problem, however, majority of them lack basic author as well as publication related meta data, are incomplete, inconsistent and often biased as they are curated keeping view certain scenario or feature. To the best of our knowledge, there exists no free data collection which is enriched with useful features and holds the potential to comprehensively develop and evaluate author name ambiguity resolving techniques. Considering this, we have prepared a dataset which is based on 14 ambiguous author name blocks/groups, where each block is composed of distinct authors sharing common name or multiple name variations. The data set covers more than eleven authors as well as publication related features, containing 7886 publication records in total, belonging to 137 distinct authors. Raw data set is extracted from multiple web sources, which is annotated and confirmed manually. After pre-processing the raw data, we have employed a user study-based evaluation method along with Cohen's Kappa Coefficient with 100 % agreement. We believe that this dataset can provide the research community an opportunity to work with a diverse and unbiased, useful feature enriched dataset to develop and test author name disambiguation related techniques, which can be generic with better performance when employed in real world scenarios.

提供机构：

Humaira Waqas

创建时间：

2021-02-13

5,000+

优质数据集

54 个

任务类型

进入经典数据集