Pelerinage: Dataset
收藏NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://zenodo.org/record/6507980
下载链接
链接失效反馈官方服务:
资源简介:
The "Pelerinage" dataset contains a fine-grained edition of excerpts from 20 medieval manuscripts of Guillaume de Digulleville's Pelerinage de vie humaine.
Files in this directory were created as part of the ECMEN Ecritures médiévales et outils numériques research project, funded by the City of Paris in the Emergence(s) framework.
If you use any of the following files, please quote:
Stutzmann, Dominique. « Les 'manuscrits datés', base de données sur l’écriture ». In Catalogazione, storia della scrittura, storia del libro. I Manoscritti datati d’Italia vent’anni dopo, ed. Teresa De Robertis and Nicoletta Giovè Marchioli, Firenze: SISMEL - Edizioni del Galluzzo, 2017, p. 155-207.
@incollection{stutzmann_les_2017,
address = {Firenze},
title = {Les « manuscrits datés », base de données sur l’écriture},
language = {fre},
booktitle = {Catalogazione, storia della scrittura, storia del libro. {I} {Manoscritti} datati d’{Italia} vent’anni dopo},
publisher = {SISMEL - Edizioni del Galluzzo},
author = {Stutzmann, Dominique},
editor = {De Robertis, Teresa and Giovè Marchioli, Nicoletta},
year = {2017},
pages = {155--207}
}
SOURCE DESCRIPTION
The source of the files are imitative transcriptions of selected passages by Géraldine Veysseyre (ORCID 0000-0002-3737-2137) based on 20 medieval manuscripts containing the "Pelerinage de vie humaine" of Guillaume de Digulleville, as produced during the OPVS research project (the acronym stands for « Old Pious Vernacular Successes » in English and « Œuvres Pieuses Vernaculaires à Succès » in French), funded by the ERC under the grant agreement n° 263274).
The transcriptions were edited and enhanced in TEI format by Dominique Stutzmann (ORCID 0000-0003-3705-5825) and Floriana Ceresato (IRHT-CNRS), as part of the ORIFLAMMS and the ECMEN (Ecritures médiévales et outils numériques) research project with following features:
semantic encoding (, , , )
, , etc. and entities to describe abbreviations
minimal description of manuscripts
Links to images and coordinates to pixels of the image for each line were added by Floriana Ceresato and Alexandre Gaudin (IRHT-CNRS).
Files in ALTO format were produced in 2021 by Jean-Baptiste Camps (ORCID 0000-0003-0385-7037) and Chahan Vidal-Gorène (ORCID 0000-0003-1567-6508) as part of a joint research presented at the conference DH2022 (Tokyo, 25-29 July 2022), cf. (abstract)
Camps, Jean-Baptiste, Chahan Vidal-Gorène, Dominique Stutzmann, Marguerite Vernet, and Ariane Pinche. « Data Diversity in Handwritten Text Recognition: Challenge or Opportunity? » In Digital Humanities 2022. Conference Abstracts (The University of Tokyo, Japan, 25-29 July 2022), published by DH2022 Local Organizing Committee, 160‑65. Tokyo, 2022. https://dh2022.dhii.asia/dh2022bookofabsts.pdf#page=162 and https://dh2022.dhii.asia/abstracts/files/CAMPS_Jean_Baptiste_Data_Diversity_in_handwritten_text_recog.html.
The dataset comprises the following folders:
/texts/ : main TEI file containing all textual, semantic and graphic information.
/img/ : 49 images on which the transcriptions are based and to which they are aligned at line level through the element in the TEI file.
/alto/ : one file per image in Alto format in several flavours according to the needs of the users during the experiments of the above mentioned paper. The text is flat, line by line, either with expansion or with abbreviations, and either with standardisation or according to the original encoding. Given the current implementation of Kraken/eScriptorium, coordinates may be indicated as "x1,y1 x2,y2..." in "/alto/without-norm-coord-commas/" but as "x1 y1 x2 y2..." in the other folders.
/img-masks/ : smaller images with a view of all coordinates of lines as present in the TEI > facsimile elements, including lines which were not selected for the transcription.
创建时间:
2022-08-10



