Whois Dataset
收藏NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://zenodo.org/record/7506562
下载链接
链接失效反馈官方服务:
资源简介:
dblp.zip
This database is used for tasks related to disambiguating author names. It contains 69,574,243 records and 10 columns and was obtained from the DBLP repository and has been preprocessed to extract all possible combinations of pairs of authors (2,665,634) unique authors) from 5,299,929 papers in the database. There are some are in the database where a single author is duplicated.
Attributes
Record ID
Publication ID
Target Author
Target Author's First Name
Target Author's Last Name
Co-author's First Name
Co-author's Last Name
Publication Title
Year of Publication
Source (Venue)
Note that Target Author = Target Author's First Name + Target Author's Last Name + Suffix. The suffix is added to the target author's name to ensure that it refers to a specific, unique person in the real world.
Example:
Given the following reference string:
Boukhers, Zeyd, and Asundi, Nagaraj Bahubali. "Deep Author Name Disambiguation Using Bibliographic Data." International Conference on Theory and Practice of Digital Libraries. Springer, Cham, 2022.
The following records are extracted:
Record ID
Publication ID
Target Author
Target Author's First Name
Target Author's Last Name
Co-author's First Name
Co-author's Last Name
Publication Title
Year of Publication
Source (Venue)
1
1
Zeyd Boukhers
Zeyd
Boukhers
Zeyd
Boukhers
Deep Author Name Disambiguation Using Bibliographic Data
2022
International Conference on Theory and Practice of Digital Libraries
2
1
Zeyd Boukhers
Zeyd
Boukhers
Nagaraj Bahubali
Asundi
Deep Author Name Disambiguation Using Bibliographic Data
2022
International Conference on Theory and Practice of Digital Libraries
3
1
Nagaraj Bahubali Asundi001
Nagaraj Bahubali
Asundi
Zeyd
Boukhers
Deep Author Name Disambiguation Using Bibliographic Data
2022
International Conference on Theory and Practice of Digital Libraries
4
1
Nagaraj Bahubali Asundi001
Nagaraj Bahubali
Asundi
Nagaraj Bahubali
Asundi
Deep Author Name Disambiguation Using Bibliographic Data
2022
International Conference on Theory and Practice of Digital Libraries
data.zip
It contains pickle files in the format _ .pickle, each of which refers to an atomic name (i.e. the acronym of the first name and the full last name), where denotes the number of real-world authors sharing the atomic name. The pickle file contains the indices of these real-world authors in the following format:
[, , , ....]
For example, 4_T Akutsu.pickle contains
['T Akutsu', 2274176, 2276257, 2290454, 2347757]
indices.zip
It contains two dictionaries:
index2auth.pickle: The real world author (not the name) is retrieved given the index
auth2index.pickle: The author index is retrieved given the real world author (not the name)
Utils.zip
It contains other necessary pickle files:
author_list.pickle: it contains the list of all authors (2665634 authors)
author_abbvs.pickle: it contains the list of atomic names of all authors (2665634 authors)
author_names.pickle; it contains the list of full names of all authors (2665634 authors)
unique.pickle: it contains the list of unique atomic names (1555517 atomic names)
unique_full_names.pickle: it contains the list of unique full names (2629851 full names)
indices.pickle: it contains the indices of authors who share the atomic name (1555517 atomic names)
Evaluation_Data.zip
It contains an example of training, validation and test data.
x_train.pickle: contains the indices of the training records
x_validate.pickle: contains the indices of the validation records
x_test.pickle: contains the indices of the testing records
y_train.pickle: contains the indices of the corresponding authors to the records in x_train.pickle
y_validate.pickle: contains the indices of the corresponding authors to the records in x_validate.pickle
y_test.pickle: contains the indices of the corresponding authors to the records in x_test.pickle
Miscs.zip
It contains data extracted from the database to be used for training and testing. The files contains data for one example atomic name. It contains the following pickle files:
records_train.pickle: contains the training records in the following format:
combination_id
reference_id
target_author
author_fname
author_lname
coauthor_fname
coauthor_lname
title
year
abbr_journal
journal
ref_train.pickle: contains the indices of the training records.
old_authors_ids.pickle
new_authors_ids.pickle
创建时间:
2023-02-08



