five

Whois Dataset

收藏
NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://zenodo.org/record/7506562
下载链接
链接失效反馈
官方服务:
资源简介:
dblp.zip This database is used for tasks related to disambiguating author names. It contains 69,574,243 records and 10 columns and was obtained from the DBLP repository and has been preprocessed to extract all possible combinations of pairs of authors (2,665,634) unique authors) from 5,299,929 papers in the database. There are some are in the database where a single author is duplicated. Attributes Record ID Publication ID Target Author Target Author's First Name Target Author's Last Name Co-author's First Name Co-author's Last Name Publication Title Year of Publication Source (Venue) Note that Target Author = Target Author's First Name + Target Author's Last Name + Suffix. The suffix is added to the target author's name to ensure that it refers to a specific, unique person in the real world. Example:  Given the following reference string:  Boukhers, Zeyd, and Asundi, Nagaraj Bahubali. "Deep Author Name Disambiguation Using Bibliographic Data." International Conference on Theory and Practice of Digital Libraries. Springer, Cham, 2022. The following records are extracted: Record ID Publication ID Target Author Target Author's First Name Target Author's Last Name Co-author's First Name Co-author's Last Name Publication Title Year of Publication Source (Venue) 1 1 Zeyd Boukhers Zeyd Boukhers Zeyd Boukhers Deep Author Name Disambiguation Using Bibliographic Data 2022 International Conference on Theory and Practice of Digital Libraries 2 1 Zeyd Boukhers Zeyd Boukhers Nagaraj Bahubali Asundi Deep Author Name Disambiguation Using Bibliographic Data 2022 International Conference on Theory and Practice of Digital Libraries 3 1 Nagaraj Bahubali Asundi001 Nagaraj Bahubali Asundi Zeyd Boukhers Deep Author Name Disambiguation Using Bibliographic Data 2022 International Conference on Theory and Practice of Digital Libraries 4 1 Nagaraj Bahubali Asundi001 Nagaraj Bahubali Asundi Nagaraj Bahubali Asundi Deep Author Name Disambiguation Using Bibliographic Data 2022 International Conference on Theory and Practice of Digital Libraries data.zip It contains pickle files in the format _ .pickle, each of which refers to an atomic name (i.e. the acronym of the first name and the full last name), where denotes the number of real-world authors sharing the atomic name. The pickle file contains the indices of these real-world authors in the following format: [, , , ....] For example, 4_T Akutsu.pickle contains  ['T Akutsu', 2274176, 2276257, 2290454, 2347757] indices.zip It contains two dictionaries:  index2auth.pickle: The real world author (not the name) is retrieved given the index auth2index.pickle: The author index is retrieved given the real world author (not the name) Utils.zip It contains other necessary pickle files: author_list.pickle: it contains the list of all authors (2665634 authors) author_abbvs.pickle: it contains the list of atomic names of all authors (2665634 authors) author_names.pickle; it contains the list of full names of all authors (2665634 authors) unique.pickle: it contains the list of unique atomic names (1555517 atomic names) unique_full_names.pickle: it contains the list of unique full names (2629851 full names) indices.pickle: it contains the indices of authors who share the atomic name (1555517 atomic names) Evaluation_Data.zip It contains an example of training, validation and test data.  x_train.pickle: contains the indices of the training records  x_validate.pickle: contains the indices of the validation records  x_test.pickle: contains the indices of the testing records  y_train.pickle: contains the indices of the corresponding authors to the records in x_train.pickle y_validate.pickle: contains the indices of the corresponding authors to the records in x_validate.pickle y_test.pickle: contains the indices of the corresponding authors to the records in x_test.pickle Miscs.zip It contains data extracted from the database to be used for training and testing. The files contains data for one example atomic name. It contains the following pickle files: records_train.pickle: contains the training records in the following format:  combination_id reference_id target_author author_fname author_lname coauthor_fname coauthor_lname title year abbr_journal journal ref_train.pickle: contains the indices of the training records.  old_authors_ids.pickle new_authors_ids.pickle
创建时间:
2023-02-08
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作