akkp69000
收藏Mendeley Data2026-04-18 收录
下载链接:
https://data.mendeley.com/datasets/rjhh6cm55z
下载链接
链接失效反馈官方服务:
资源简介:
Context
Scientific papers, as well as other types of documents, can be identified by a set of keywords.
Typically, authors are free to choose their Keywords. When authors decide keywords, we called them Authors' Keywords (AK)
Sometimes, keywords are imposed, limited or infered by using algorithms.
KeyWordsPlus © (KP) try to use information from the bibliographic references of an article to infere keywords.
Content
This dataset contains information about AK and KP of 69.000 articles. All the articles have been retrieved from Web of Science (WOS): https://www.webofknowledge.com
The data is splitted into three different collections:
Raw: The raw data, as comes from WoS. Document are distributed over CSV documents. Columns "DE" and "ID" referers to AK and KP, respectively.
filtered: We've removed all the articles which don't contain information about AK and KP at the same time.
pre_processed: We have cleaned keywords to remove special character, and we have lowercased and stemmed all the keywords.
In filtered and pre_processed, you will find two text documents: "ak.txt" and "kp.txt", every line of these documents referers to the same article.
So for example, the article number 8 have the following keywords:
AK: Automated knowledge assessment; concept map; linking phrase; semantic analysis
KP: SCIENCE
After pre-processing, the article number 8 have the following keywords:
- AK: automknowledgassess;conceptmap;linkphrase;semant_analysi
- KP: scienc
Acknowledgements
We want to thank Web of Science for giving access to it's database.
创建时间:
2020-11-23



