five

Portuguese Handwriting 16th-19th c.

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/13986217
下载链接
链接失效反馈
官方服务:
资源简介:
All data were imported from the platform Transkribus on which the AI model for automatic transcription “Portuguese Handwriting 16th-19th c.” was last trained in July 2023 with the recognition engine Pylaia, and can now be used. The data are divided into ten folders, according to the total number of the trainings, from the initial to the definitive one, plus one set for final validation. The eight previous trainings were realized between June 2022 and May 2023. The history of all trainings can be read on e-Inquisition. Each of these folders corresponds to one collection in the platform; every collection has a number of documents; every document has a number of images, or pages, as indicated below. The ten uploaded folders (zip) are distributed as follows: —nine Training Sets (TS) (ca 92% of the whole data; status of the transcriptions from the TS: Ground Truth); —the final Validation Set (VS) (ca 8% of the whole data; status of the transcriptions from the VS: Ground Truth). All TS folders contain only the new data added to the following training (thus added to the previous data). Only the last VS, which is complete (505 p.), is provided. One document = images / transcribed pages (Ground Truth: transcription made by the members of TraPrInq project (Transcrever os processos da Inquisição portuguesa, 1536-1821 | Transcribing the court records of the Portuguese Inquisition, 1536-1821), which lasted from January 2023 to July 2024. The majority of the documents are titled as follows: IL_number = document extracted from a trial record (processo) by the Inquisition of Lisbon_number of the processo; other titles: IC_ = Inquisition of Coimbra; IE_ = Inquisition of Évora. Total of transcribed pages: 6,417. The quality of the images in the data (jpg) is equal to that of the images used for automatic transcription. All digitized images can be found on the catalog of the Portuguese National Archives (Arquivo Nacional da Torre do Tombo, ANTT). Available data (10 zip files, total size 6.7 GB): Training Set1: 698 pages/images Training Set2: 984 pages/images Training Set3: 869 pages/images Training Set4: 926 pages/images Training Set5: 631 pages/images Training Set6: 665 pages/images Training Set7: 564 pages/images Training Set8: 549 pages/images Training Set9: 531 pages/images Validation Set_Final: 505 pages/images 2-one pdf file: Paleographical criteria used by the team for the transcription of the documents; list of characters (in Portuguese).
创建时间:
2025-02-20
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作