DBLP-derived labeled data for author name disambiguation
收藏figshare.com2023-05-31 更新2025-01-15 收录
下载链接:
https://figshare.com/articles/dataset/DBLP-derived_labeled_data_for_author_name_disambiguation/6840281/2
下载链接
链接失效反馈官方服务:
资源简介:
This is a DBLP-derived labeled data originally created by Dr. C. Lee Giles at Penn State University and filtered for duplicate removal and error correction by Dr. Jinseok Kim at University of Michigan. For more details, see references below.1. Kim, Jinseok (2018). Evaluating author name disambiguation for digital libraries: a case of DBLP. Scientometrics. doi:10.1007/s11192-018-2824-5 2. Kim, Jinseok & Kim, Jenna (2018). The impact of imbalanced training data on machine learning for author name disambiguation. Scientometrics. doi: 10.1007/s11192-018-2865-9Each row refers to an author name instance with following feature information separated by tab.author name: full name string extracted from DBLPunique author id: labels assigned manually by Dr. C. Lee Giles's teampaper id: assigned by Dr. Jinseok Kimauthor list: names of authors in the byline of the paperyear: publication yearvenue: conference or journal namestitle: stopwords removed and stemmed by the Porter's stemmerIf you want to use this dataset, please consider to cite papers below.For the original dataset: Han, H., Giles, L., Zha, H., Li, C., & Tsioutsiouliklis, K. (2004). Two Supervised Learning Approaches for Name Disambiguation in Author Citations. JCDL 2004: Proceedings of the Fourth ACM/IEEE Joint Conference on Digital Libraries, 296-305. doi:10.1145/996350.996419For the filtered dataset: 1. Kim, Jinseok (2018). Evaluating author name disambiguation for digital libraries: a case of DBLP. Scientometrics. doi:10.1007/s11192-018-2824-5 or2. Kim, Jinseok & Kim, Jenna (2018). The impact of imbalanced training data on machine learning for author name disambiguation. Scientometrics. doi: 10.1007/s11192-018-2865-9
本数据集源自DBLP数据库,由宾夕法尼亚州立大学李C. 吉莱斯博士创建,并由密歇根大学金仁泽博士进行去重和错误纠正处理。详情请参阅下述参考文献。1. 金仁泽(2018年)。评估数字图书馆中的作者名称去歧义:以DBLP为例。科学计量学,doi:10.1007/s11192-018-2824-5 2. 金仁泽与金珍娜(2018年)。不平衡训练数据对作者名称去歧义机器学习的影响。科学计量学,doi: 10.1007/s11192-018-2865-9。每一行数据代表一个作者名称实例,包含以下特征信息,以制表符分隔:作者名称:从DBLP中提取的完整名称字符串;唯一作者标识符:由李C. 吉莱斯博士团队手动分配;论文标识符:由金仁泽博士分配;作者列表:论文署名栏中的作者姓名;年份:出版年份;场合:会议或期刊名称;标题:通过波特词干提取法去除停用词并提取词干。若需使用本数据集,请考虑引用以下论文。对于原始数据集:韩辉、李莉、赵辉、李晨、Tsioutsiouliklis(2004年)。作者引用名称去歧义的两个监督学习方法。数字图书馆联合会议2004年会议论文集,第296-305页。doi:10.1145/996350.996419;对于过滤后的数据集:1. 金仁泽(2018年)。评估数字图书馆中的作者名称去歧义:以DBLP为例。科学计量学,doi:10.1007/s11192-018-2824-5 或 2. 金仁泽与金珍娜(2018年)。不平衡训练数据对作者名称去歧义机器学习的影响。科学计量学,doi: 10.1007/s11192-018-2865-9。
提供机构:
figshare



