SART - Similarity, Analogies, and Relatedness for Tatar Language
收藏arXiv2019-03-31 更新2024-06-21 收录
下载链接:
https://github.com/tat-nlp/SART
下载链接
链接失效反馈官方服务:
资源简介:
SART数据集由因诺波利斯大学创建,包含三个子数据集:相似性、类比和相关性,专为鞑靼语设计。这些数据集基于英语现有数据集,但考虑了鞑靼语的特定文化差异和语言特点进行扩展。相似性数据集包含202个单词对,相关性数据集包含252个单词,类比数据集则探索语义、句法和形态学方面。这些数据集不仅用于评估词嵌入模型,还可应用于自动词典构建、机器翻译和语义解析等领域,旨在解决低资源语言研究中的资源不平衡问题。
The SART dataset was created by Innopolis University, and consists of three sub-datasets: Similarity, Analogy, and Relevance, which are specially designed for the Tatar language. These datasets are based on existing English datasets, and have been expanded by taking into consideration the specific cultural differences and linguistic characteristics of the Tatar language. The Similarity sub-dataset includes 202 word pairs, the Relevance sub-dataset contains 252 words, and the Analogy sub-dataset explores semantic, syntactic and morphological aspects. These datasets can not only be utilized to evaluate word embedding models, but also applied in fields including automatic dictionary construction, machine translation and semantic parsing. The dataset aims to solve the resource imbalance problem in low-resource language research.
提供机构:
因诺波利斯大学
创建时间:
2019-03-31



