five

DEPLAIN

收藏
arXiv2023-05-30 更新2024-06-21 收录
下载链接:
https://github.com/rstodden/DEPlain
下载链接
链接失效反馈
官方服务:
资源简介:
DEPLAIN数据集是由杜塞尔多夫大学的研究团队创建的,旨在推动德语句子和文档简化的研究。该数据集包含约500个文档对和约13k句子对的新闻领域语料库,以及约150个对齐文档和约2k对齐句子对的网络领域语料库。数据集中的翻译主要使用简单德语('Einfache Sprache'),并包括“强”和“温和”简化。此外,研究团队还开发了网络收割机和实验性的自动对齐方法,以帮助整合未对齐和即将发布的平行文档,并动态扩展网络领域语料库。DEPLAIN数据集的应用领域包括自动文本简化系统的训练和评估,以及相关任务如文本样式转移的探索。

The DEPLAIN dataset was developed by a research team at Heinrich Heine University Düsseldorf to advance research on German sentence and document simplification. This dataset includes a news-domain corpus with approximately 500 document pairs and 13k sentence pairs, as well as a web-domain corpus containing around 150 aligned documents and 2k aligned sentence pairs. All simplifications in the dataset primarily use Simple German ('Einfache Sprache'), and cover both 'strong' and 'moderate' simplification levels. Furthermore, the research team has developed a web harvester and experimental automatic alignment methods to assist in integrating unaligned and upcoming parallel documents, and dynamically expand the web-domain corpus. Application scenarios of the DEPLAIN dataset include the training and evaluation of automatic text simplification systems, as well as the exploration of related tasks such as text style transfer.
提供机构:
杜塞尔多夫大学
创建时间:
2023-05-30
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作