five

Synthetic Dataset of Citation Strings in 12 Styles

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/10839502
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset was produced in the aim of testing different tools for citation string parsing, as part of the experiment reported in the paper: Iana Atanassova and Marc Bertin, 2024. "Breaking Boundaries in Citation Parsing: A Comparative Study of Generative LLMs and Traditional Out-of-the-box Citation Parsers", Bibliometric-enhanced Information Retrieval workshop (BIR), collocated with ECIR 2024, Glasgow, Scotland.  Data The data that is provided here is organised as follows: the file citation-strings.zip contains raw citation strings that were generated for each of the 12 citation styles in txt format the file parsers-output.csv contains the output that was produced from the parsers: ChatGPT, Llama, and Neural ParsCit To cite this work To use this dataset and/or the results produced in the experiment, please cite the following article: @inproceedings{atanassova2024citparse,    title = {{Breaking Boundaries in Citation Parsing: A Comparative Study of Generative LLMs and Traditional Out-of-the-box Citation Parsers}},     author = {Iana Atanassova and Marc Bertin},    year = {2024},    booktitle = {{International Workshop on Bibliometric-enhanced Information Retrieval (BIR 2024) co-located with the 46\textsuperscript{st} European Conference on Information Retrieval (ECIR 2024)}},    address = {Glasgow, Scotland}} Authors information Iana Atanassova, ORCID https://orcid.org/0000-0003-3571-4006 URL https://iana-atanassova.github.io/ Marc Bertin, ORCID https://orcid.org/0000-0003-1803-6952 URL https://elico-recherche.msh-lse.fr/membres/marc-bertin Related github repository https://github.com/iana-atanassova/citation-parsers-bir2024.git
创建时间:
2024-03-19
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作