five

German Readability and Simplification Corpus

收藏
arXiv2019-09-20 更新2024-08-06 收录
下载链接:
http://arxiv.org/abs/1909.09067v1
下载链接
链接失效反馈
官方服务:
资源简介:
德国可读性和简化语料库(German Readability and Simplification Corpus)由苏黎世大学计算语言学研究所创建,包含约211,000个句子,主要来源于网络资源。该数据集独特之处在于包含了文本结构、版式和图像信息,这些信息可用于机器学习方法中的可读性评估和文本简化。数据集主要应用于自动可读性评估和自动文本简化,旨在通过简化语言的词汇和句法复杂性,增加对困难概念的解释,以及清晰布局的结构,来帮助认知障碍者、学习障碍者、前语言聋人、功能性文盲者和外语学习者更好地理解文本。

The German Readability and Simplification Corpus was developed by the Institute of Computational Linguistics at the University of Zurich, containing approximately 211,000 sentences primarily sourced from web resources. A distinctive feature of this corpus is its incorporation of textual structure, layout, and image information, which can be utilized for readability assessment and text simplification in machine learning methods. The corpus is mainly applied to automatic readability assessment and automatic text simplification, with the aim of helping individuals with cognitive impairments, learning disabilities, pre-lingual deaf people, individuals with functional illiteracy, and foreign language learners better understand texts by simplifying the lexical and syntactic complexity of language, adding explanations for difficult concepts, and organizing content with clear structural layouts.
提供机构:
苏黎世大学计算语言学研究所
创建时间:
2019-09-20
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作