Krapivin

Name: Krapivin
Creator: IEEE Dataport
License: 暂无描述

ieee-dataport.org2025-01-21 收录

下载链接：

https://ieee-dataport.org/documents/krapivin

下载链接

链接失效反馈

官方服务：

资源简介：

In this paper we use Natural Language Processing techniques to improve different machine learning approaches (Support Vector Machines (SVM), Local SVM, Random Forests) to the problem of automatic keyphrases extraction from scientific papers. For the evaluation we propose a large and high-quality dataset: 2000 ACM papers from the Computer Science domain. We evaluate by comparison with expert-assigned keyphrases. Evaluation shows promising results that outperform state-of-the-art Bayesian learning system KEA improving the average F-Measure from 22% (KEA) to 30% (Random Forest) on the same dataset without the use of controlled vocabularies. Finally, we report a detailed analysis of the effect of the individual NLP features and data set size on the overall quality of extracted keyphrases.

本文采用自然语言处理技术，旨在优化支持向量机（SVM）、局部SVM和随机森林等不同机器学习方法，以解决从科学论文中自动提取关键词的问题。为评估目的，我们构建了一个庞大且高质量的语料库：包含2000篇计算机科学领域的ACM论文。通过将提取的关键词与专家分配的关键词进行对比，我们进行了评估。评估结果显示，所提出的方法具有显著的成效，超越了现有的贝叶斯学习系统KEA，在同一数据集上，随机森林的平均F-Measure从22%（KEA）提升至30%，且无需使用受控词汇表。最后，我们对单个自然语言处理特征和数据集规模对提取关键词总体质量的影响进行了详细分析。

提供机构：

IEEE Dataport

5,000+

优质数据集

54 个

任务类型

进入经典数据集