YouTube垃圾邮件收集数据集
收藏帕依提提2024-03-04 收录
下载链接:
https://www.payititi.com/opendatasets/show-26328.html
下载链接
链接失效反馈官方服务:
资源简介:
This corpus has been collected using the YouTube Data API v3. Data Set Information: The table below lists the datasets, the YouTube video ID, the amount of samples in each class and the total number of samples per dataset. Dataset --- YouTube ID -- # Spam - # Ham - Total Psy ------- 9bZkp7q19f0 --- 175 --- 175 --- 350 KatyPerry - CevxZvSJLk8 --- 175 --- 175 --- 350 LMFAO ----- KQ6zr6kCPj8 --- 236 --- 202 --- 438 Eminem ---- uelHwf8o7_U --- 245 --- 203 --- 448 Shakira --- pRpeEdMmmQ0 --- 174 --- 196 --- 370 Note: the chronological order of the comments were kept. Attribute Information: The collection is composed by one CSV file per dataset, where each line has the following attributes: COMMENT_ID,AUTHOR,DATE,CONTENT,TAG We offer one example bellow: z12oglnpoq3gjh4om04cfdlbgp2uepyytpw0k,Francisco Nora,2013-11-28T19:52:35,please like :D [Web link],1 Relevant Papers: Alberto, T.C., Lochter J.V., Almeida, T.A. TubeSpam: Comment Spam Filtering on YouTube. Proceedings of the 14th IEEE International Conference on Machine Learning and Applications (ICMLA'15), 1-6, Miami, FL, USA, December, 2015. T.A. ALMEIDA, T.P. SILVA, I. SANTOS and J.M. GOMEZ HIDALGO. Text Normalization and Semantic Indexing to Enhance Instant Messaging and SMS Spam Filtering. Knowledge-based Systems, Elsevier, 108(2016), 25-32, 2016. Citation Request: We would appreciate: 1. If you find this collection useful, make a reference to the paper below and the web page: [Web link]. 2. Send us a message either to talmeida < AT > ufscar.br or tuliocasagrande < AT > acm.org in case you make use of the corpus.
提供机构:
帕依提提



