SM03: Evaluation of Feature Selection and Weighting methods for topical Website Multi-class Classification
收藏NIAID Data Ecosystem2026-03-10 收录
下载链接:
https://data.mendeley.com/datasets/zzmp7t8msn
下载链接
链接失效反馈官方服务:
资源简介:
The repository is related to a website classification research, named: "Evaluation of Feature Selection and Weighting methods for topical Website Multi-class Classification"
The main focus of the study is a comprehensive evaluation of state-of-the-art term weighting models, in the context of business website classification. The models are decomposed into their local and global components and recombined into 32 hybrid models, representing all viable variations, beyond what was initially considered by the original authors. The results showed that multi-class classification performances can be significantly improved if recently proposed global weighting components of Inverse Gravity Moment and Inverse Class Space Density Frequency, are combined with less addressed, but highly effective, local functions, like square root Term Frequency and Glasgow. In addition, filter-model feature selection functions, based on information theory, are empirically evaluated together with web page selection functions for website representation construction.
The repository provides:
+ content analysis and other statistics on used datasets: WebKB's 7-Sector 1997 and WebKB 7-Sector 2018
Reports generated during three stages of experiments:
+ Feature selection function evaluation
+ 32 hybrid term weighting models evaluation
+ Weg page selection functions evaluation
Note: the content snippets are removed from the experiment reports, in order to comply to the copyrights of source websites. Hence many folders in the reports remained empty.
An experiment report directory, normally contains the following:
+ Subdirectories for each fold of cross validation
5-fold[0-5]
directory_readme.txt -- description of contained files
dt_test_results.xlsx -- classification results, after aggregated from k-folds
log.txt -- Log output generated by imbWBI Console Tool
note.txt -- Notes on the experiment
In fold subdirectories:
+ Corpus -- subdirectory, contains reports of selected features and processed corpus
note.txt -- provides description of the experiment setup
创建时间:
2018-09-25



