Multi-Sense embeddings through a word sense disambiguation process
收藏DataCite Commons2022-04-10 更新2025-04-17 收录
下载链接:
http://deepblue.lib.umich.edu/data/concern/data_sets/jh343s29p
下载链接
链接失效反馈官方服务:
资源简介:
This data set is a collection of word similarity benchmarks (RG65, MEN3K, Wordsim 353, simlex999, SCWS, yp130, simverb3500) in their original format and converted into a cosine similarity scale.
In addition, we have two Wikpedia Dumps from 2010 (April) and 2018 (January) in which we provide the original format (raw words), converted using the techniques described in the paper (MSSA, MSSA-D and MSSA-NR) (title in this repository), and also the word embeddings models for 300d and 1000d using a word2vec implementation. A readme.txt is provided with more details for each file.
本数据集收录了多款词相似度基准测试集(RG65、MEN3K、Wordsim 353、SimLex999、SCWS、YP130、SimVerb3500),涵盖其原始格式与转换为余弦相似度量表的版本。
此外,本数据集还包含两份2010年4月及2018年1月的维基百科转储文件,分别提供原始词汇格式版本、采用本仓库论文中所述的MSSA、MSSA-D及MSSA-NR技术转换后的版本;同时还提供了基于word2vec实现的300维与1000维词嵌入模型。所有文件的详细说明均已收录于readme.txt中。
提供机构:
University of Michigan
创建时间:
2019-05-15



