five

Ground truth data for "Identifying publications of cumulative dissertation theses by bilingual text similarity"

收藏
NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://zenodo.org/record/4733849
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset contains data used in the publication "Identifying publications of cumulative dissertation theses by bilingual text similarity. Evaluation of similarity methods on a new short text task". It included bibliographical data for German PhD theses (dissertations) and associated publications for cumulative dissertations. Not included is content from Elsevier's Scopus database used in the study, except item identifiers. Users with access to the data can use these for matching. File diss_data.csv contains bibliographic data of dissertation theses obtained from German National Library and cleaned and postprocessed The columns are: REQUIZ_NORM_ID: Identifier for the thesis TITLE: Cleaned thesis title HEADING: Descriptor terms (German) AUTO_LANG: Language, either from original record or automatically derived from title File ground_truth_pub_metadata.csv contains bibliographic data for identified consitutive publications of theses. If columns 2 to 7 are empty, the thesis did not include any publications ("stand-alone" or monograph thesis). The columns are: REQUIZ_NORM_ID: Identifier for the thesis, for matching with the data in file SCOPUS_ID: Scopus ID for the identified publication AUTORS: Author names of the publication as in the original thesis citation YEAR: Publication year of the publication as in the original thesis citation TITLE: Publication title as in the original thesis citation SOURCETITLE: Source title as in the original thesis citation PAGES: Page information of the publication as in the original thesis citation Scopus identifiers are published with permission by Elsevier.
创建时间:
2021-05-03
二维码
社区交流群
二维码
科研交流群
商业服务