five

SM03: Evaluation of Feature Selection and Weighting methods for topical Website Multi-class Classification

收藏
NIAID Data Ecosystem2026-03-10 收录
下载链接:
https://data.mendeley.com/datasets/zzmp7t8msn
下载链接
链接失效反馈
官方服务:
资源简介:
The repository is related to a website classification research, named: "Evaluation of Feature Selection and Weighting methods for topical Website Multi-class Classification" The main focus of the study is a comprehensive evaluation of state-of-the-art term weighting models, in the context of business website classification. The models are decomposed into their local and global components and recombined into 32 hybrid models, representing all viable variations, beyond what was initially considered by the original authors. The results showed that multi-class classification performances can be significantly improved if recently proposed global weighting components of Inverse Gravity Moment and Inverse Class Space Density Frequency, are combined with less addressed, but highly effective, local functions, like square root Term Frequency and Glasgow. In addition, filter-model feature selection functions, based on information theory, are empirically evaluated together with web page selection functions for website representation construction. The repository provides: + content analysis and other statistics on used datasets: WebKB's 7-Sector 1997 and WebKB 7-Sector 2018 Reports generated during three stages of experiments: + Feature selection function evaluation + 32 hybrid term weighting models evaluation + Weg page selection functions evaluation Note: the content snippets are removed from the experiment reports, in order to comply to the copyrights of source websites. Hence many folders in the reports remained empty. An experiment report directory, normally contains the following: + Subdirectories for each fold of cross validation 5-fold[0-5] directory_readme.txt -- description of contained files dt_test_results.xlsx -- classification results, after aggregated from k-folds log.txt -- Log output generated by imbWBI Console Tool note.txt -- Notes on the experiment In fold subdirectories: + Corpus -- subdirectory, contains reports of selected features and processed corpus note.txt -- provides description of the experiment setup
创建时间:
2018-09-25
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作