five

Malawi - Integrated Household Living Conditions Survey 2010-2011 ; Subset for Machine Learning Comparative Assessment Project

收藏
WORLD BANK GROUP2018-06-06 更新2026-03-28 收录
下载链接:
https://datacatalog.worldbank.org/search/dataset/0048927
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset contains a set of data files used as input for a World Bank research project (empirical comparative assessment of machine learning algorithms applied to poverty prediction). The objective of the project was to compare the performance of a series of classification algorithms. The dataset contains variables at the household, individual, and community levels. The variables selected to serve as potential predictors in the machine learning models are all qualitative variables (except for the household size). Information on household consumption is included, but in the form of dummy variables (indicating whether the household consumed or not each specific product or service listed in the survey questionnaire). The household-level data file contains the variables "Poor / Non poor" which served as the predicted variable ("label") in the models.One of the data files included in the dataset contains data on household consumption (amounts) by main categories of products and services. This data file was not used in the prediction model. It is used only for the purpose of analyzing the models mis-classifications (in particular, to identify how far the mis-classified households are from the national poverty line).These datasets are provided to allow interested users to replicate the analysis done for the project using Python 3 (a collection of Jupyter Notebooks containing the documented scripts is openly available on GitHub).
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作