Annotated Corpora of Historical Catalan (HisCat) - Llibre dels Fets
收藏NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://zenodo.org/record/5615758
下载链接
链接失效反馈官方服务:
资源简介:
This repository is part of the Annotated Corpora of Historical Catalan (HisCat). It contains the first POS-tagged text that is partially manually corrected and used to train Old Catalan POS taggers, described in the following paper:
Meelen, Marieke & Pujol i Campeny, Afra, (2021) 'Old Catalan Morphosyntax: developing an annotated corpus' in Journal of Open Humanities Data.
This POS-tagged text is the 13th century Llibre dels Fets, a historical chronicle. The version of the text used for this project is
Bruguera, J. (1991). El Llibre dels Fets del Rei en Jaume. Barcelona: Barcino.
as prepared for the Corpus Informatitzat del Català Antic
Torruella, J., Pérez Saldanya, M., & Martines, J. (2009). Corpus Informatitzat del Català Antic. URL: http://cica.cat/.
The subcorpus counts with 164,096 POS-annotated tokens (165,538 tokens including punctuation and folio markers), of which 60,000 have been manually corrected. This subcorpus contains a total of and 4,506 main clauses. POS tagging of this text was done with the Memory-Based Tagger by TiMBL (https://languagemachines.github.io/mbt/). The code accompanying the paper can be found on GitHub: https://github.com/lothelanor/catalancorpora). In addition to memory-based tagging, have tried neural-based tagging with TARGER (https://github.com/achernodub/targer) for which we created word embeddings that can be found on Zenodo. Results for memory-based tagging were better, however, which is why this version is uploaded here.
创建时间:
2021-10-29



