five

DEplain-APA

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/7674559
下载链接
链接失效反馈
官方服务:
资源简介:
DEplain: A corpus for German Text SimplificationThis repository contains the corpus called DEplain-APA for German text simplification (document and sentence simplification). The corpus contains Austrian nexts text provided by the APA - Austria Presse Agentur eG. All of the sentence-wise aligned pairs (complex-simple) are manually aligned. The following table summarizes the most important meta data of the corpus. meta data value language DE-AT (Austrian German) domain news source language level B1 target language level A2 # document pairs (total, train/dev/test) 483 (387/48/48) # sentence pairs (total, train/dev/test) 13,122 (10,660/1,231/1,231) # complex sentences 25,607 # simple sentences 26,471   Updates: Version 1.2: More system outputs are added. For comparisons of your models with existing models, please have a look at ./DEPlain/G__Automatic_Text_Simplification_Experiments/generated_outputs/sentence-level.  Version 1.1: Alignment Labels in Simplification Plans are repaired. For more info see https://github.com/rstodden/DEPlain/issues/2#issue-1875006089   For more information, please have a look at our paper. If you use this corpus, please also cite our paper and name APA - Austria Presse Agentur eG as data provider: Regina Stodden, Omar Momen, and Laura Kallmeyer. 2023. DEplain: A German Parallel Corpus with Intralingual Translations into Plain Language for Sentence and Document Simplification. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 16441–16463, Toronto, Canada. Association for Computational Linguistics. For more information regarding available system outputs and comparisons between these models, please have a look at the following paper:  Regina Stodden. 2024. Reproduction & Benchmarking of German Text Simplification Systems. In Proceedings of the Workshop on DeTermIt! Evaluating Text Difficulty in a Multilingual Context @ LREC-COLING 2024, pages 1–15, Torino, Italia. ELRA and ICCL.
创建时间:
2024-10-04
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作