Japanese Word Similarity Dataset
收藏arXiv2018-02-22 更新2024-06-21 收录
下载链接:
https://github.com/tmu-nlp/JapaneseWordSimilarityDataset
下载链接
链接失效反馈官方服务:
资源简介:
本数据集名为‘Japanese Word Similarity Dataset’,由东京都立大学创建,旨在评估日语中的分布式词表示。数据集包含多种词性,不仅包括常用词,还涵盖了稀有词汇,总计4851条数据。创建过程中,采用了基于例子的方法来控制相似性评级的变异性,并从日语评估词汇简化数据集中提取词对。该数据集主要应用于自然语言处理领域,特别是词嵌入和分布式表示的评估,以解决日语中缺乏此类资源的问题。
This dataset, named 'Japanese Word Similarity Dataset', was created by Tokyo Metropolitan University to evaluate distributed word representations in Japanese. It encompasses various parts of speech, including both common words and rare vocabulary, with a total of 4851 entries. During the development process, an example-based method was adopted to control the variability of similarity ratings, and word pairs were extracted from the simplified Japanese evaluation vocabulary dataset. This dataset is primarily applied in the field of natural language processing, especially for the evaluation of word embeddings and distributed representations, aiming to address the shortage of such resources for Japanese language processing.
提供机构:
东京都立大学
创建时间:
2017-03-17



