Replication data for "The Specchieri MarVen Dataset: an Abbreviation-rich Dataset in Venetian Idiom"
收藏DataCite Commons2023-09-08 更新2024-07-13 收录
下载链接:
https://dataverse.iit.it/citation?persistentId=doi:10.48557/GJYJTW
下载链接
链接失效反馈官方服务:
资源简介:
Despite the release of numerous datasets for training models in historical handwritten text recognition, there is still a significant need for more diverse and extensive data. This dataset release aims to contribute to bridging this gap. It comprises 159 pages from an Early Modern age volume part of the Venetian 'Marigold' collection. It contains various abbreviations that are key to transcribing for a complete understanding of the content. To accommodate different research needs, the dataset is released in two versions: one with 'expanded' abbreviations and another without abbreviations -- where the abbreviations are removed --, aligning with the choices made for other released datasets. Additionally, the dataset encompasses two distinct writing styles. For this reason, three separate splits for training and evaluating machine learning models are released: one with a combination of both styles and two individual splits for each style.
提供机构:
IIT Dataverse
创建时间:
2023-07-12



