five

A study on real-time low-quality content detection on Twitter from the users’ perspective

收藏
NIAID Data Ecosystem2026-03-10 收录
下载链接:
https://figshare.com/articles/dataset/A_study_on_real-time_low-quality_content_detection_on_Twitter_from_the_users_perspective/5295067
下载链接
链接失效反馈
官方服务:
资源简介:
Detection techniques of malicious content such as spam and phishing on Online Social Networks (OSN) are common with little attention paid to other types of low-quality content which actually impacts users’ content browsing experience most. The aim of our work is to detect low-quality content from the users’ perspective in real time. To define low-quality content comprehensibly, Expectation Maximization (EM) algorithm is first used to coarsely classify low-quality tweets into four categories. Based on this preliminary study, a survey is carefully designed to gather users’ opinions on different categories of low-quality content. Both direct and indirect features including newly proposed features are identified to characterize all types of low-quality content. We then further combine word level analysis with the identified features and build a keyword blacklist dictionary to improve the detection performance. We manually label an extensive Twitter dataset of 100,000 tweets and perform low-quality content detection in real time based on the characterized significant features and word level analysis. The results of our research show that our method has a high accuracy of 0.9711 and a good F1 of 0.8379 based on a random forest classifier with real time performance in the detection of low-quality content in tweets. Our work therefore achieves a positive impact in improving user experience in browsing social media content.
创建时间:
2017-08-10
二维码
社区交流群
二维码
科研交流群
商业服务