Indonesian-GEC-Corpus
收藏数据集概述
数据集名称
Indonesian-GEC-Corpus
数据集用途
该数据集专为印尼语语法错误纠正(GEC)任务构建,旨在支持专注于该领域的研究。
数据集规模
包含13,709个句子,涉及10种词性标签(POS tags)。
版本更新
v1.1版本
- 删除了"preposition"类别中的2条数据。
- 删除了"indefinite pronoun"类别中的3条数据。
- 经过重新测试,结果与原始论文中的结果一致,保留了三位小数。
引用信息
若使用此数据集,请引用以下论文:
@article{10.1145/3440993, author = {Lin, Nankai and Chen, Boyu and Lin, Xiaotian and Wattanachote, Kanoksak and Jiang, Shengyi}, title = {A Framework for Indonesian Grammar Error Correction}, year = {2021}, issue_date = {June 2021}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, volume = {20}, number = {4}, issn = {2375-4699}, url = {https://doi.org/10.1145/3440993}, doi = {10.1145/3440993}, journal = {ACM Trans. Asian Low-Resour. Lang. Inf. Process.}, month = may, articleno = {57}, numpages = {12}, keywords = {Grammatical error correction, word-embedding, indonesian language processing, low-resource language} }




