five

Data for Training and Evaluating Metadata Extraction Models based on 15 Thousand Cyrillic Script Publications

收藏
NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://zenodo.org/record/4708695
下载链接
链接失效反馈
官方服务:
资源简介:
Description Data for training and evaluating sequence labeling models for metadata extraction based on 15,553 Cyrillic script language papers spanning 27 years and three languages. For each paper, ground truth sequence labeling output is provided in TEI format and as annotated plain text.   The code used for creating and evaluating the data set can be found on GitHub. For citing, you can refer to our paper introducing the data set: @inproceedings{kssf-2021-cyrillic, title = {{Bootstrapping Multilingual Metadata Extraction: A Showcase in Cyrillic}}, author = {Krause, Johan and Shapiro, Igor and Saier, Tarek and F{\"a}rber, Michael}, booktitle = {Proceedings of the Second Workshop on Scholarly Document Processing}, year = {2021} }
创建时间:
2021-04-22
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作