CEX Project - Output Dataset (Anystyle, GROBID, OUTCITE)
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/10524897
下载链接
链接失效反馈官方服务:
资源简介:
The published files are the outputs obtained after the extraction experiments performed within the research project.Anystyle folder contains three folders,
Anystyle_1.4.2_Autumn with the output files of the experiments of extraction with Anystyle v 1.4.2 performed during Autumn 2023
Anystyle_1.4.2_Spring with the output files of the experiments of extraction with Anystyle v 1.4.2 performed during Spring 2023
Anystyle_teiConversion_Autumn with the converted TEI files (using the below cited code) of the output files of the experiments of extraction with Anystyle v 1.4.2 performed during Autumn 2023
Anystyle_teiConversion_Spring with the converted TEI files (using the below cited code) of the output files of the experiments of extraction with Anystyle v 1.4.2 performed during Spring 2023
Grobid folder contains 8 folders,
Grobid_Test_0.7.2 with the output files of the experiments of extraction with Grobid v0.7.2
Grobid_Test_0.7.3 with the output files of the experiments of extraction with Grobid v0.7.3
GrobidOutput_Train1 with the output files of the experiments of extraction with the first training configuration Grobid (using the datatset mentioned below and grobid documents)
GrobidOutput_Train2 with the output files of the experiments of extraction with the second training configuration Grobid (using the datatset mentioned below and grobid documents)
GrobidOutput_Train3 with the output files of the experiments of extraction with the third training configuration Grobid (using the datatset mentioned below and grobid documents)
GrobidOutput_Train4 with the output files of the experiments of extraction with the fourth training configuration Grobid (using the datatset mentioned below and grobid documents)
GrobidOutput_Train5 with the output files of the experiments of extraction with the fifth training configuration Grobid (using the datatset mentioned below and grobid documents)
GrobidOutput_Train6 with the output files of the experiments of extraction with the sixth training configuration Grobid (using the datatset mentioned below and grobid documents)
Outcite folder contains 2 folders,
outcite_canonical_refs with the canonical representation of the processed references
outcite_canonical_refs_conversion with the TEI translation of the canonical representation of the processed references
This publication is part of my Thesis research for the Digital Humanities and Digital Knowledge Master's Course at University of Bologna.
The gold standard can be found here, Pagnotta, O. (2024). CEX Project - Dataset and Gold Standard [Data set]. Zenodo. https://doi.org/10.5281/zenodo.10535653.
The code can be found here Pagnotta, O. (2024). olgagolgan/CEX-Project: CEX Project Code (software). Zenodo. https://doi.org/10.5281/zenodo.10638757.
The training dataset of GROBID can be found here Pagnotta, O. (2024). CEX Project - GROBID annotation aligned Gold Standard (Version 1) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.10529646.
The trained GROBID citation models can be found here Pagnotta, O. (2024). CEX Project - trained GROBID citation models. Zenodo. https://doi.org/10.5281/zenodo.10529709.
Some results can be found here Pagnotta, O. (2023). Investigating the performance of GROBID and OUTCITE (Version v1). Zenodo. https://doi.org/10.5281/zenodo.10036455.
The final service can be found here Pagnotta, O. and Paolini, L. (2024). opencitations/cec: alpha version (service). Zenodo. https://doi.org/10.5281/zenodo.10635630.
创建时间:
2024-02-09



