five

The REVERINO Collection of Regesta

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14971612
下载链接
链接失效反馈
官方服务:
资源简介:
Overview The REVERINO Dataset is a collection of 4,533 pairs of Latin regesta (summaries) and their corresponding full-text medieval pontifical documents.The dataset is derived from two primary collections: MGH: Epistolae saeculi XIII e regestis pontificum Romanorum selectae (1216-1268) Auvray:  Les Registres de Gregoire IX (1227/41) The dataset is designed to support research in Latin text summarization and the development of tools for automatic regesta generation using Large Language Models (LLMs). It serves as a benchmark for evaluating the performance of LLMs in summarizing medieval Latin texts. Dataset Structure The dataset is organized into nine JSON files, each corresponding to a volume of the collections. Each JSON file contains an array of objects, where each object represents a single document with the following fields: numero: A unique identifier for the document. header: The header or title of the document, often including the date and location. regesto: An array of strings representing the _regestum_ (summary) of the document. testo esteso: An array of strings representing the full text of the document. apparato: An array of strings containing the apparatus (metadata or references) for the document. Data Curation Process The dataset was created through a four-step pipeline: Annotation: Manual annotation of selected pages using the eScriptorium platform to train segmentation models. Training: Adaptation of segmentation models to the specific layout of the manuscripts. Extraction: Automated extraction of text lines from the annotated pages. Post-processing: Separation of regesta, full texts, and apparatus using heuristics based on content and position. License This dataset is licensed under the Creative Commons Attribution 4.0 International License (CC BY 4.0). You are free to share and adapt the material for any purpose, provided you give appropriate credit to the original authors. References Puccetti, Giovanni, Laura Righi, Ilaria Sabbatini, and Andrea Esuli. "REVERINO: REgesta generation VERsus latIN summarizatiOn." IRCDL, 2025. Acknowledgments This work was supported by the Italian Strengthening of ESFRI RI RESILIENCE (ITSERR) project, funded by the European Union under the NextGenerationEU funding scheme (CUP: B53C22001770006). Contact Giovanni Puccetti [giovanni.puccetti@isti.cnr.it]
创建时间:
2025-03-05
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作