INDOLEM
收藏arXiv2020-11-02 更新2024-06-21 收录
下载链接:
https://indolem.github.io
下载链接
链接失效反馈官方服务:
资源简介:
INDOLEM是由墨尔本大学创建的综合性数据集,旨在推动印尼语自然语言处理的研究。该数据集包含七个不同的NLP任务,涵盖形态句法、语义和话语分析,共有八个子数据集,其中五个基于先前工作,三个为新创。数据集的创建过程包括标准化数据分割和评估指标,以增强可重复性和稳健的基准测试。INDOLEM的应用领域广泛,旨在解决印尼语在NLP研究中代表性不足的问题,通过提供丰富的资源和标准化的任务来推动该领域的进步。
INDOLEM is a comprehensive dataset developed by the University of Melbourne, aiming to advance research in Indonesian natural language processing (NLP). This dataset encompasses seven distinct NLP tasks covering morphosyntax, semantics and discourse analysis, and consists of eight sub-datasets, five of which are built upon prior work while the remaining three are newly created. The development process of INDOLEM adopts standardized data splitting and evaluation metrics, to enhance reproducibility and support robust benchmarking. INDOLEM has broad application scenarios, and is designed to address the underrepresentation of Indonesian in NLP research, thereby advancing the field by providing abundant resources and standardized tasks.
提供机构:
墨尔本大学
创建时间:
2020-11-02



