Synthetic Dataset of Citation Strings in 12 Styles
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/10839502
下载链接
链接失效反馈官方服务:
资源简介:
This dataset was produced in the aim of testing different tools for citation string parsing, as part of the experiment reported in the paper:
Iana Atanassova and Marc Bertin, 2024. "Breaking Boundaries in Citation Parsing: A Comparative Study of Generative LLMs and Traditional Out-of-the-box Citation Parsers", Bibliometric-enhanced Information Retrieval workshop (BIR), collocated with ECIR 2024, Glasgow, Scotland.
Data
The data that is provided here is organised as follows:
the file citation-strings.zip contains raw citation strings that were generated for each of the 12 citation styles in txt format
the file parsers-output.csv contains the output that was produced from the parsers: ChatGPT, Llama, and Neural ParsCit
To cite this work
To use this dataset and/or the results produced in the experiment, please cite the following article:
@inproceedings{atanassova2024citparse, title = {{Breaking Boundaries in Citation Parsing: A Comparative Study of Generative LLMs and Traditional Out-of-the-box Citation Parsers}}, author = {Iana Atanassova and Marc Bertin}, year = {2024}, booktitle = {{International Workshop on Bibliometric-enhanced Information Retrieval (BIR 2024) co-located with the 46\textsuperscript{st} European Conference on Information Retrieval (ECIR 2024)}}, address = {Glasgow, Scotland}}
Authors information
Iana Atanassova, ORCID https://orcid.org/0000-0003-3571-4006 URL https://iana-atanassova.github.io/
Marc Bertin, ORCID https://orcid.org/0000-0003-1803-6952 URL https://elico-recherche.msh-lse.fr/membres/marc-bertin
Related github repository
https://github.com/iana-atanassova/citation-parsers-bir2024.git
创建时间:
2024-03-19



