KaWAT
收藏arXiv2019-06-17 更新2024-06-21 收录
下载链接:
https://github.com/kata-ai/kawat
下载链接
链接失效反馈官方服务:
资源简介:
KaWAT是由印度尼西亚的Kata.ai研究团队开发的一个针对印尼语的词汇类比任务数据集。该数据集包含15000条句法类比和19000条语义类比查询,总计34000条数据。数据集的构建借鉴了英语类比任务数据集的方法,分为句法和语义两类,主要用于评估印尼语预训练词向量的性能。KaWAT的创建旨在为印尼语词向量的研究提供一个公开可用的基准数据集,特别关注于解决词向量在句法和语义信息捕捉方面的挑战。
KaWAT is a lexical analogy task dataset for the Indonesian language developed by the research team at Kata.ai in Indonesia. It contains 15,000 syntactic analogy queries and 19,000 semantic analogy queries, totaling 34,000 data entries. The construction of this dataset draws on the methodologies of English lexical analogy task datasets, and it is divided into two categories: syntactic and semantic. It is primarily used to evaluate the performance of pre-trained word embeddings for Indonesian. The creation of KaWAT aims to provide a publicly available benchmark dataset for research on Indonesian word embeddings, with a particular focus on addressing the challenges that word embeddings face in capturing syntactic and semantic information.
提供机构:
Kata.ai 雅加达,印度尼西亚
创建时间:
2019-06-17



