five

Known Undisclosed Paid Editors (English Wikipedia)

收藏
DataCite Commons2025-05-01 更新2024-07-27 收录
下载链接:
https://figshare.com/articles/Known_Undisclosed_Paid_Editors_English_Wikipedia_/6176927/1
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset contains a manually curated set of known undisclosed paid editor (UPE) accounts from Wikipedia. This is not a complete set of known editors. Editors who do not appear in this set are not guaranteed to not be paid editors.<br>See also https://en.wikipedia.org/wiki/Wikipedia:Paid-contribution_disclosure<br><br>The dataset contains four columns:<br> - <b>user_name</b>: The username of the UPE - <b>case_page_name</b>: The page name (title) of a page describing the case through which paid editing was discovered. - <b>type</b>: One of three types of UPEs (described below) - <b>notes</b>: Any notes that a dataset curator chose to include with the example. <br><br><br>Type 1User makes just over 10 minor edits. Is quiet for a few days well waiting for autoconfirm (user right) to kick in (takes 4 days). Then creates a promotional article in one big edit followed by the account going silent.This is the main priority. These are present in the largest numbers and are the clearest pattern. They also cause the most damage to our shared brand.Type 2User is an obvious newbie. Makes lots of mistakes. Often turns out to be internal staff. Not a key priority. We already manage these cases fairly well as they are often so obvious.Type 3Undisclosed paid editor, but one who only moves on to new accounts once their current account gets detected. A serious problem--these will be harder to detect as we will have smaller numbers of these cases. Also a long time will need to pass before a pattern becomes present

本数据集收录了经人工整理的维基百科已知未披露付费编辑(Undisclosed Paid Editor,UPE)账号集合。本集合并非所有已知付费编辑者的完整清单,未出现在本数据集中的编辑者,无法被保证并非付费编辑。<br>详见:https://en.wikipedia.org/wiki/Wikipedia:Paid-contribution_disclosure<br><br>本数据集包含四列数据:<br> - <b>user_name</b>:该未披露付费编辑的用户名<br> - <b>case_page_name</b>:记载此次付费编辑事件的维基百科页面名称(标题)<br> - <b>type</b>:三类未披露付费编辑账号之一(详见下文说明)<br> - <b>notes</b>:数据集整理者为该条目添加的任意备注信息。<br><br>以下为三类账号的具体说明:<br>类型1:该用户仅进行了十余次小型编辑,在等待自动确认(用户权限,需耗时4天)生效的数日内保持沉寂,随后通过单次大编辑创建一篇推广性质条目,之后该账号便不再活跃。此类账号为核心排查对象:其数量最为庞大,特征模式最为清晰,且对维基百科的共享品牌形象造成的损害最为严重。<br>类型2:该用户为明显的维基百科新手,频繁出现编辑失误,事后通常被证实为维基内部员工。此类账号并非核心排查重点,由于其特征过于明显,我们目前已能较好地处理这类事件。<br>类型3:该用户为未披露付费编辑,但会在当前账号被检测到后,转而使用新账号继续违规操作。此类账号检测难度更高,且实际案例数量相对较少,同时需要更长时间才能挖掘出其行为模式,因此属于较为棘手的严重问题。
提供机构:
figshare
创建时间:
2018-04-24
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作