Data for: Machine Learning based Heterogeneous Web Advertisements Detection Using a Diverse Feature Set
收藏Mendeley Data2026-04-18 收录
下载链接:
https://data.mendeley.com/datasets/5bzh52txpn
下载链接
链接失效反馈官方服务:
资源简介:
Advertisement identification and filtering in web pages gain significance due to various factors such as accessibility, security, privacy, and obtrusiveness. Current practices in this direction involve maintaining URL-based regular expressions called filter lists. Each URL obtained on a web page is matched against this filter list. While effectual, this procedure lacks scalability as it demands regular continuance of the filter list. To counter these limitations, we devise a machine learning based advertisement detection system using a diverse feature set which can distinguish advertisement blocks from non-advertisement blocks. The method can act as a base to provide various accessibility-related features like smooth browsing and text summarization for persons with visual impairments, cognitive impairments, and photosensitive epilepsy. The results from a classifier trained on the proposed feature set achieve 93.4% accuracy in identifying advertisements.
创建时间:
2018-06-29



