CO.PRE.PAN Full Corpus (Restricted)
收藏Zenodo2026-02-23 更新2026-05-26 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.18740954
下载链接
链接失效反馈官方服务:
资源简介:
This record contains the complete CO.PRE.PAN (Corpus de Prensa Panhispánico) press corpus, organized into country-specific ZIP archives with linguistically annotated JSON files. Due to copyright restrictions, all texts and annotations are distributed under restricted access and cannot be shared openly. Users may request access directly through Zenodo.
Contents of this record
Each {COUNTRYCODE}.zip archive contains:
Press texts in plain text format (txt-files/)
Annotated JSON files (json-annotated/)
All ZIP archives were generated using the internal script "zenodo_corpus_zip.py", which automatically tracks timestamps and file changes to ensure reproducible versioning.
Corpus description
CO.PRE.PAN is a cross-national corpus of written press Spanish from 18 Spanish-speaking countries, comprising over 14 million words. It is structurally aligned with the spoken broadcast corpus CO.RA.PAN (Corpus Radiofónico Panhispánico) and serves as a scripted register baseline for comparative analyses of national standard varieties of Spanish. All texts are drawn from comparable press genres and produced under broadly equivalent publication conditions across countries, ensuring cross-national comparability.
Versioning
Each version of this record represents a coherent snapshot of the full corpus at a specific point in time. Updates may include newly added texts, corrected or extended annotations, and improvements to preprocessing and linguistic annotation.
Annotation details
Each JSON file contains:
tokenization, sentence segmentation
POS tags, lemmas, and morphological features
dependency relations
automatic categorization of verbal tense and related features
All annotations are generated using spaCy (model: es_dep_news_trf), followed by project-specific quality control steps, using the same annotation pipeline applied to CO.RA.PAN.
Legal and access information
The restricted status of this record is due to copyright limitations. Only short text extracts may be displayed publicly under scientific quotation rules and text-and-data-mining provisions of EU Directive 2019/790 and the German UrhG (§51, §60d, §44b). Redistribution or reuse of the full texts and annotations is not permitted.
Access requests can be submitted directly through Zenodo. For scientific inquiries or technical questions, please contact the CO.PRE.PAN project team.
提供机构:
Zenodo
创建时间:
2026-02-23



