UA-GEC
收藏arXiv2022-11-09 更新2024-06-21 收录
下载链接:
https://github.com/grammarly/ua-gec
下载链接
链接失效反馈官方服务:
资源简介:
UA-GEC是首个针对乌克兰语的语法错误修正和流畅性编辑的专业标注数据集。该数据集由Grammarly公司收集,涵盖了从文本聊天到正式写作的多种写作领域,共计20,715条句子。数据来源于多样化的作者群体,包括母语和非母语乌克兰语使用者。专业校对人员对数据进行了语法、拼写、标点和流畅性方面的修正和标注。此数据集适用于开发和评估乌克兰语的语法错误修正系统,也可用于研究多语言和低资源NLP、形态丰富的语言、文档级语法错误修正和流畅性修正。
UA-GEC is the first professionally annotated dataset dedicated to Ukrainian grammatical error correction (GEC) and fluency editing. Collected by Grammarly, this dataset covers a wide range of writing domains from text chats to formal writing, with a total of 20,715 sentences. The data is sourced from a diverse group of authors, including both native and non-native Ukrainian speakers. Professional proofreaders made corrections and annotations on the data in terms of grammar, spelling, punctuation, and fluency. This dataset is suitable for developing and evaluating Ukrainian grammatical error correction systems, as well as for research on multilingual and low-resource NLP, morphologically rich languages, document-level grammatical error correction and fluency editing.
提供机构:
Grammarly
创建时间:
2021-03-31



