five

UA-GEC

收藏
arXiv2022-11-09 更新2024-06-21 收录
下载链接:
https://github.com/grammarly/ua-gec
下载链接
链接失效反馈
官方服务:
资源简介:
UA-GEC是首个针对乌克兰语的语法错误修正和流畅性编辑的专业标注数据集。该数据集由Grammarly公司收集,涵盖了从文本聊天到正式写作的多种写作领域,共计20,715条句子。数据来源于多样化的作者群体,包括母语和非母语乌克兰语使用者。专业校对人员对数据进行了语法、拼写、标点和流畅性方面的修正和标注。此数据集适用于开发和评估乌克兰语的语法错误修正系统,也可用于研究多语言和低资源NLP、形态丰富的语言、文档级语法错误修正和流畅性修正。

UA-GEC is the first professionally annotated dataset dedicated to Ukrainian grammatical error correction (GEC) and fluency editing. Collected by Grammarly, this dataset covers a wide range of writing domains from text chats to formal writing, with a total of 20,715 sentences. The data is sourced from a diverse group of authors, including both native and non-native Ukrainian speakers. Professional proofreaders made corrections and annotations on the data in terms of grammar, spelling, punctuation, and fluency. This dataset is suitable for developing and evaluating Ukrainian grammatical error correction systems, as well as for research on multilingual and low-resource NLP, morphologically rich languages, document-level grammatical error correction and fluency editing.
提供机构:
Grammarly
创建时间:
2021-03-31
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作