five

Deleted Wikipedia articles (spam/vandalism/attack)

收藏
DataCite Commons2020-09-03 更新2024-07-25 收录
下载链接:
https://figshare.com/articles/dataset/Deleted_Wikipedia_articles_spam_vandalism_attack_/4245035/1
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset contains a random sample of deleted articles from English Wikipedia were the reason was explicitly either spam, vandalism, or attack. 25 articles were sampled for each deletion reason for a total of 75 articles. Text of the articles was censored to remove identifying information. <br>The dataset contains the following columns:<br> · page_title -- The title of the deleted page · rev_id -- The rev_id of the first revision · creation_timestamp -- The time that the page was created · archived -- 1 if the page was deleted, 0 if not (always 1) · draft_quality -- The deletion reason (spam|vandalism|attack) · censored_text -- The censored text of the deleted page<br>Censored blocks are noted with a comment block in the censored_text column of the form "Censored: [reason]([explanation])" -- e.g. "Censored: PII(phone number)". <br><br>
提供机构:
figshare
创建时间:
2016-11-21
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作