five

CEX Project - Dataset and Gold Standard

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/10524829
下载链接
链接失效反馈
官方服务:
资源简介:
The files represent the new version of Cioffi, A. (2022) Data for Testing and Evaluating References Extraction and Parsing Tools (1.0) https://doi.org/10.5281/zenodo.6182066. The 56 files of Cioffi, 2022 were corrected and additional 56 PDFs were manually annotated and aligned to the project guidelines. The TEI files of the whole dataset (112 documents) can be found under GoldStandard_TEI zip folder, while the PDFs are in GoldStandard_PDF folder.GoldStandard.txt contains the list of the bibliographic references of each article. When an entry is marked as "Restricted", it means the correspondent PDF is not in Open Access format, thus it is not shared in the present publication and cannot be found in the GoldStandard_PDF folder, the other Open Access papers are shared in it. The code can be found at Pagnotta, O. (2024). olgagolgan/CEX-Project: CEX Project Code (software). Zenodo. https://doi.org/10.5281/zenodo.10638757. The output dataset of Anystyle, GROBID and OUTCITE can be found here  Pagnotta, O. (2024). CEX Project - Output Dataset (Anystyle, GROBID, OUTCITE) (Version 1) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.10524898. The annotated training dataset for GROBID can be found here Pagnotta, O. (2024). CEX Project - GROBID annotation aligned Gold Standard (Version 1) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.10529646. The trained GROBID citation models can be found here Pagnotta, O. (2024). CEX Project - trained GROBID citation models. Zenodo. https://doi.org/10.5281/zenodo.10529709. Some results can be found here Pagnotta, O. (2023). Investigating the performance of GROBID and OUTCITE (Version v1). Zenodo. https://doi.org/10.5281/zenodo.10036455. The final service can be found here Pagnotta, O. and Paolini, L. (2024). opencitations/cec: alpha version (service). Zenodo. https://doi.org/10.5281/zenodo.10635630. The work is part of my Thesis research for the Digital Humanities and Digital Knowledge Master's Course at University of Bologna.
创建时间:
2024-02-09
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作