NLI-PT
收藏arXiv2018-05-01 更新2024-06-21 收录
下载链接:
http://www.clul.ulisboa.pt/en/resources-en/11resources/894-nli-pt-a-portuguese-native-language-identification-dataset
下载链接
链接失效反馈官方服务:
资源简介:
NLI-PT是由里斯本大学的语言学中心-CLUL创建的葡萄牙语原生语言识别数据集,包含1,868篇由学习欧洲葡萄牙语的学生撰写的文章,涵盖15种不同的第一语言(L1)。数据集不仅包含原始学生文本,还包括四种不同类型的标注:POS、细粒度POS、成分解析和依赖解析。NLI-PT不仅适用于原生语言识别(NLI),还适用于第二语言习得和教育NLP领域的多项研究。数据集的创建过程涉及从多个学习者语料库中收集数据,并使用NLP工具进行统一格式化和多级语言标注。NLI-PT的应用领域广泛,包括计算机辅助语言学习、语法错误检测与纠正、拼写检查以及第一语言干扰研究,旨在解决第二语言学习中的特定问题。
NLI-PT is a Portuguese native language identification dataset developed by the Centre for Linguistics at the University of Lisbon (CLUL). It comprises 1,868 articles written by students learning European Portuguese, covering 15 distinct first languages (L1). In addition to the original student texts, the dataset includes four types of annotations: part-of-speech (POS) tagging, fine-grained POS tagging, constituent parsing, and dependency parsing. NLI-PT is applicable not only to native language identification (NLI) tasks but also to multiple research studies in the fields of second language acquisition and educational natural language processing (NLP). The construction of NLI-PT involved collecting data from multiple learner corpora, followed by unified formatting and multi-level linguistic annotation using NLP tools. NLI-PT has a broad range of applications, including computer-assisted language learning, grammatical error detection and correction, spell checking, and first language interference research, aiming to address specific challenges in second language learning.
提供机构:
里斯本大学,语言学中心-CLUL,葡萄牙
创建时间:
2018-05-01
搜集汇总
数据集介绍

背景与挑战
背景概述
NLI-PT是一个葡萄牙语原生语言识别数据集,包含1,868篇由学习欧洲葡萄牙语的学生撰写的文章,覆盖15种第一语言,并提供四种语言标注(如POS和依赖解析)。它适用于原生语言识别、第二语言习得和教育NLP研究,应用领域包括计算机辅助语言学习和语法错误检测。
以上内容由遇见数据集搜集并总结生成



