阿拉伯平行性别语料库2.0 (APGC v2.0)
收藏arXiv2021-10-18 更新2024-06-21 收录
下载链接:
http://resources.camel-lab.com/
下载链接
链接失效反馈官方服务:
资源简介:
阿拉伯平行性别语料库2.0(APGC v2.0)是由纽约大学阿布扎比分校计算语言建模实验室创建的一个大型数据集,专注于阿拉伯语中的性别识别和重写。该数据集包含超过63,000个句子对,涵盖了第一和第二人称在不同语法性别下的表达。数据集的构建过程包括从OpenSubtitles 2018数据集中筛选句子,并进行手动标注和重写以确保性别表达的准确性。APGC v2.0的应用领域包括性别识别、受控文本生成和后编辑重写系统,旨在根据用户的语法性别偏好提供个性化的自然语言处理应用。
The Arabic Parallel Gender Corpus 2.0 (APGC v2.0) is a large-scale dataset developed by the Computational Language Modeling Lab at New York University Abu Dhabi, focusing on gender identification and rewriting in Arabic. This corpus contains over 63,000 sentence pairs, covering first- and second-person expressions across different grammatical genders. The construction of APGC v2.0 involves screening sentences from the OpenSubtitles 2018 dataset, followed by manual annotation and rewriting to ensure the accuracy of gendered expressions. Application domains of APGC v2.0 include gender identification, controlled text generation, and post-editing rewriting systems, aiming to deliver personalized natural language processing applications tailored to users' grammatical gender preferences.
提供机构:
纽约大学阿布扎比分校计算语言建模实验室
创建时间:
2021-10-18



