CEX Project - Dataset and Gold Standard
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/10524829
下载链接
链接失效反馈官方服务:
资源简介:
The files represent the new version of Cioffi, A. (2022) Data for Testing and Evaluating References Extraction and Parsing Tools (1.0) https://doi.org/10.5281/zenodo.6182066. The 56 files of Cioffi, 2022 were corrected and additional 56 PDFs were manually annotated and aligned to the project guidelines. The TEI files of the whole dataset (112 documents) can be found under GoldStandard_TEI zip folder, while the PDFs are in GoldStandard_PDF folder.GoldStandard.txt contains the list of the bibliographic references of each article. When an entry is marked as "Restricted", it means the correspondent PDF is not in Open Access format, thus it is not shared in the present publication and cannot be found in the GoldStandard_PDF folder, the other Open Access papers are shared in it.
The code can be found at Pagnotta, O. (2024). olgagolgan/CEX-Project: CEX Project Code (software). Zenodo. https://doi.org/10.5281/zenodo.10638757.
The output dataset of Anystyle, GROBID and OUTCITE can be found here Pagnotta, O. (2024). CEX Project - Output Dataset (Anystyle, GROBID, OUTCITE) (Version 1) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.10524898.
The annotated training dataset for GROBID can be found here Pagnotta, O. (2024). CEX Project - GROBID annotation aligned Gold Standard (Version 1) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.10529646.
The trained GROBID citation models can be found here Pagnotta, O. (2024). CEX Project - trained GROBID citation models. Zenodo. https://doi.org/10.5281/zenodo.10529709.
Some results can be found here Pagnotta, O. (2023). Investigating the performance of GROBID and OUTCITE (Version v1). Zenodo. https://doi.org/10.5281/zenodo.10036455.
The final service can be found here Pagnotta, O. and Paolini, L. (2024). opencitations/cec: alpha version (service). Zenodo. https://doi.org/10.5281/zenodo.10635630.
The work is part of my Thesis research for the Digital Humanities and Digital Knowledge Master's Course at University of Bologna.
创建时间:
2024-02-09



