DEplain-APA
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/7674559
下载链接
链接失效反馈官方服务:
资源简介:
DEplain: A corpus for German Text SimplificationThis repository contains the corpus called DEplain-APA for German text simplification (document and sentence simplification). The corpus contains Austrian nexts text provided by the APA - Austria Presse Agentur eG. All of the sentence-wise aligned pairs (complex-simple) are manually aligned. The following table summarizes the most important meta data of the corpus.
meta data
value
language
DE-AT (Austrian German)
domain
news
source language level
B1
target language level
A2
# document pairs (total, train/dev/test)
483 (387/48/48)
# sentence pairs (total, train/dev/test)
13,122 (10,660/1,231/1,231)
# complex sentences
25,607
# simple sentences
26,471
Updates:
Version 1.2: More system outputs are added. For comparisons of your models with existing models, please have a look at ./DEPlain/G__Automatic_Text_Simplification_Experiments/generated_outputs/sentence-level.
Version 1.1: Alignment Labels in Simplification Plans are repaired. For more info see https://github.com/rstodden/DEPlain/issues/2#issue-1875006089
For more information, please have a look at our paper. If you use this corpus, please also cite our paper and name APA - Austria Presse Agentur eG as data provider:
Regina Stodden, Omar Momen, and Laura Kallmeyer. 2023. DEplain: A German Parallel Corpus with Intralingual Translations into Plain Language for Sentence and Document Simplification. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 16441–16463, Toronto, Canada. Association for Computational Linguistics.
For more information regarding available system outputs and comparisons between these models, please have a look at the following paper:
Regina Stodden. 2024. Reproduction & Benchmarking of German Text Simplification Systems. In Proceedings of the Workshop on DeTermIt! Evaluating Text Difficulty in a Multilingual Context @ LREC-COLING 2024, pages 1–15, Torino, Italia. ELRA and ICCL.
创建时间:
2024-10-04



