Ground truth data for "Identifying publications of cumulative dissertation theses by bilingual text similarity"
收藏NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://zenodo.org/record/4733849
下载链接
链接失效反馈官方服务:
资源简介:
This dataset contains data used in the publication "Identifying publications of cumulative dissertation theses by bilingual text similarity. Evaluation of similarity methods on a new short text task". It included bibliographical data for German PhD theses (dissertations) and associated publications for cumulative dissertations. Not included is content from Elsevier's Scopus database used in the study, except item identifiers. Users with access to the data can use these for matching.
File diss_data.csv contains bibliographic data of dissertation theses obtained from German National Library and cleaned and postprocessed
The columns are:
REQUIZ_NORM_ID: Identifier for the thesis
TITLE: Cleaned thesis title
HEADING: Descriptor terms (German)
AUTO_LANG: Language, either from original record or automatically derived from title
File ground_truth_pub_metadata.csv contains bibliographic data for identified consitutive publications of theses. If columns 2 to 7 are empty, the thesis did not include any publications ("stand-alone" or monograph thesis).
The columns are:
REQUIZ_NORM_ID: Identifier for the thesis, for matching with the data in file
SCOPUS_ID: Scopus ID for the identified publication
AUTORS: Author names of the publication as in the original thesis citation
YEAR: Publication year of the publication as in the original thesis citation
TITLE: Publication title as in the original thesis citation
SOURCETITLE: Source title as in the original thesis citation
PAGES: Page information of the publication as in the original thesis citation
Scopus identifiers are published with permission by Elsevier.
创建时间:
2021-05-03



