Assessing Word Similarity Metrics for Traceability Link Recovery - Evaluation Dataset
收藏NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://zenodo.org/record/6580279
下载链接
链接失效反馈官方服务:
资源简介:
This dataset includes all data that was used for the evaluation of my bachelor's thesis:
Assessing Word Similarity Metrics for Traceability Link Recovery
The following files correspond to the following data sets from the evaluation:
cc-en-300.tar.gz corresponds to fastText's cc.en.300.bin embedding
crawl-300d-2M-subword.tar.gz corresponds to fastText's crawl-300d-2M-subword.bin embedding
wiki-news-300d-1M-subword.tar.gz corresponds to fastText's wiki-news-300d-1M-subword.bin embedding
wordnet.tar.gz corresponds to the WordNet 3.1 semantic network
sewordsim.tar.gz corresponds to SEWordSimDB's vector similarity database
glove_cc_840B_300d.tar.gz corresponds to GloVe's CC vector embedding
glove_wikigiga_300d.tar.gz corresponds to GloVe's 300 dimensional WIGI vector embedding
glove_wikigiga_200d.tar.gz corresponds to GloVe's 200 dimensional WIGI vector embedding
glove_wikigiga_100d.tar.gz corresponds to GloVe's 100 dimensional WIGI vector embedding
glove_wikigiga_50d.tar.gz corresponds to GloVe's 50 dimensional WIGI vector embedding
glove_twitter_200d.tar.gz corresponds to GloVe's 200 dimensional TWTR vector embedding
glove_twitter_100d.tar.gz corresponds to GloVe's 100 dimensional TWTR vector embedding
glove_twitter_50d.tar.gz corresponds to GloVe's 50 dimensional TWTR vector embedding
glove_twitter_25d.tar.gz corresponds to GloVe's 25 dimensional TWTR vector embedding
eval_results.tar.gz contains the detailed evaluation results for each configuration of all measures
The licenses of all data sets are included in their respective files.
Some of these data sets are .sql files. To use these files to reproduce the evaluation, they need to be imported into a sqlite3 database. The version of ArDoCo used for the evaluation is only able to work with sqlite3 databases and not with sql files.
创建时间:
2022-06-04



