Paraly: Replication package for exploring the concept of paralysis (fr. ‘paralysie’) in a digital corpus of French Literature
收藏DataCite Commons2025-02-20 更新2025-04-16 收录
下载链接:
https://madata.bib.uni-mannheim.de/471
下载链接
链接失效反馈官方服务:
资源简介:
This replication package provides all necessary resources to reproduce the dataset and methodological approach described in the Paraly data paper. The dataset consists of three corpora (full texts and metadata) of French literature from the 18th, 19th, and 20th centuries, containing both figurative and concrete linguistic references (annotations) to the concept of paralysis. The texts originate from the “Les classiques de la littérature” collection maintained on Gallica, the digital library of the Bibliothèque nationale de France (BnF). The replication package includes scripts and documentation for data collection, extraction, processing, annotation, and model training. It contains: scripts for data and metadata collection, original OCR-ed texts with metadata from Gallica, text excerpts containing the character sequence “paraly” and their manual annotations, annotation guidelines detailing the methodology used, a pre-trained multilabel classifier trained on the annotated data using the flair library, a graphical user interface application for automatic annotation, code and workflows for processing text corpora. By providing these resources, the replication package enables researchers to reproduce the dataset creation process, refine the annotation workflow, and extend the methodological approach to other literary corpora.
提供机构:
Mannheim University Library
创建时间:
2025-02-19



