five

Untitled Item

收藏
DataCite Commons2020-09-01 更新2024-07-25 收录
下载链接:
https://figshare.com/articles/dataset/Untitled_Item/5513449/1
下载链接
链接失效反馈
官方服务:
资源简介:
This project contains data on most English-language Wikipedia articles within the category "Category:Politicians by nationality" and subcategories, along with the code used to generate that data. Both are released under the CC-BY-SA 4.0 license.<br><b>Data</b>The data was extracted via the Wikimedia API using the associated code. It is formatted as a CSV and saved as <i>page_data.csv</i> in the "data" directory. Columns are:<br>1. "country", containing the sanitised country name, extracted from the category name;2. "page", containing the unsanitised page title.<br>Country codes are inconsistent. Where possible, they have been modified to match the country names found in http://www.prb.org/DataFinder/Topic/Rankings.aspx?ind=14 - but the PRB dataset contains nations not found in Wikipedia, and vice versa.<br>The actual recursion only went 2 levels deep into the category tree: someone listed as an Antiguan politician, say, is included - someone exclusively listed as an Antiguan politician who was assassinated is not.<br><b>Code</b>The code is written in the programming language R, and heavily commented; it can be found in the "code" directory, and is split into 3 files:<br>1. <i>utils.R, </i>which contains utilities for operating the code in the other files;2. <i>retrieve.R</i>, which contains functions for retrieving the category and page data from Wikipedia;3. <i>main.R</i>, which executes the data retrieval code and performs sanitisation before writing it to file.
提供机构:
figshare
创建时间:
2017-10-19
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作