five

Domestic Electrical Load Survey - Key Variables 1994-2014 - South Africa

收藏
www.datafirsttest.uct.ac.za2020-04-29 更新2025-03-22 收录
下载链接:
https://www.datafirsttest.uct.ac.za/dataportal/index.php/catalog/758
下载链接
链接失效反馈
官方服务:
资源简介:
Abstract --------------------------- This dataset is a harmonisation of the Domestic Electrical Load Survey (DELS) 1994-2014 dataset. The DELS 1994-2014 questionnaires were changed in 2000. Subsequently survey questions vary between 1994-1999 and 2000-2014. This makes data processing complex, as survey responses first need to be associated with their year of collection and corresponding questionnaire before they can be correctly interpreted. This Key Variables dataset is a user-friendly version of the original dataset. It contains household responses to the most important survey questions, as well as geographic and linking information that allows for the households to be matched to their respective electricity metering data. This dataset and similar custom datasets can be produced from the DELS 1994-2014 dataset with the python package delprocess. The data processing section includes a description of how this dataset was created. The development of the tools to create this dataset was funded by the South African National Energy Development Initiative (SANEDI). Geographic coverage --------------------------- The study had national coverage. Analysis unit --------------------------- Households Universe --------------------------- The dataset covers South African households in the DELS 1994-2014 dataset. These are electrified households that received electricity either directly from Eskom or from their local municipality. Kind of data --------------------------- Administrative records Sampling procedure --------------------------- The dataset includes all households for which survey responses have been captured in the DELS1994-2014 dataset. Mode of data collection --------------------------- Face-to-face [f2f] Cleaning operations --------------------------- This dataset has been constructed from the DELS 1994-2014 dataset using the data processing functions in the delprocess python package (www.github.com/wiebket/delprocess: release v1.0). The delprocess python package takes the complexities of the original DELS 1994-2014 dataset into account and makes use of 'spec files' to specify the processing steps that must be performed. To retrieve data for all survey years, two separate spec files are required to process survey response from 1994-1999 and 2000-2014. The spec files used to produce this dataset are included in the program files and can be used as templates for new custom datasets. Full instructions on how to use them to process the data are in the README file contained in the delprocess package. SPEC FILES specify the following processing steps: 1. List of search terms for which survey questions will be searched, and variables returned 2. Transformations (addition, subtraction, multiplication) of variables retrieved from search output 3. Bin intervals for variables (requires numeric data) 4. Lables for bins (requires binned data) 5. Details of bin segments 6. Replacement (encoding) of coded variable values 7. Higher level geography detail In particular, the DELSKV 1994-2014 dataset has been produced by specifying the following processing steps: TRANSFORMATIONS * monthly_income from 1994 - 1999 is the variable returned by the 'income' search term * monthly_income from 2000 - 2014 is calculated as the sum of the variables returned by the 'earn per month', 'money from small business' and 'external' search terms * Appliance numbers from 1994 - 1999 is the count of appliances (no data was collected on broken appliances) * Appliance numbers from 2000-2014 is the count of appliances [minus] the count of broken appliances (except for TV which included no information on broken appliances) * A new total_adults variable was created by summing the number of all occupants (male and female) over 16 years old * A new total_children variable was created by summing the number of all occupants (male and female) under 16 years old * A new total_pensioners variable was created by summing the number of pensioners (male and female) over 16 years old * A new total_unemployed variable was created by summing the number of unemployed occupants (male and female) over 16 years old * A new total_part_time variable was created by summing the number of part time employed occupants (male and female) over 16 years old * roof_material and wall_material values for 1994-1999 were augmented by 1 * water_access was transformed for 1994-1999 to be 4 [minus] the 'watersource' value REPLACEMENTS * Appliance usage values have been replaced with: 0=never 1=monthly 2=weekly 3=daily * water_access values have been replaced with: 1=nearby river/dam/borehole 2=block/street taps 3=tap in yard 4=tap inside house * roof_material and wall_material values have been replaced with: 1=IBR/Corr.Iron/Zinc 2=Thatch/Grass 3=Wood/Masonite board 4=Brick 5=Block 6=Plaster 7=Concrete 8=Tiles 9=Plastic 10=Asbestos 11=Daub/Mud/Clay OTHER NOTES Appliance usage information was only collected after 2000. No binning was done to segment survey responses for this dataset. Data appraisal --------------------------- * The 2000-2014 survey questions contain no variable for 'number of females: 50+', which goes against the pattern of other occupant age categories. * Spacing in the original questions is irregular and can cause challenges when specifying transformations (eg. 'number of males: 16-24' and 'number of males: 25 - 34', 'part time' and 'parttime'). * Spelling mistakes in the original questions can cause challenges when specifying transformations (eg. 'head emploed part time'). MISSING VALUES Missing values have not been replaced and are represented as blanks except for imputed columns (total_adults, total_children, ...) and appliances after 2000, where missing values have been replaced with a 0.

摘要 --------------------------- 本数据集是对1994-2014年国内电力负荷调查(DELS)数据集的整合。DELS 1994-2014的调查问卷于2000年进行了调整。此后,1994-1999年与2000-2014年的调查问题存在差异,这使得数据处理变得复杂,因为调查响应首先需要与收集年份及其对应的问卷相关联,才能正确解读。本关键变量数据集是原始数据集的用户友好版本。它包含对最重要的调查问题的家庭响应,以及地理和链接信息,这些信息使得可以将家庭与其相应的电表数据进行匹配。本数据集以及类似的定制数据集可以通过delprocess python包从DELS 1994-2014数据集中生成。数据加工部分包括关于如何创建本数据集的描述。创建本数据集的工具开发得到了南非国家能源发展倡议(SANEDI)的资助。 地理覆盖范围 --------------------------- 本研究具有全国覆盖。 分析单元 --------------------------- 家庭 总体 --------------------------- 本数据集涵盖了DELS 1994-2014数据集中的南非家庭。这些家庭通过从Eskom直接获得电力或从其当地市政府获得电力而获得电力供应。 数据类型 --------------------------- 行政记录 抽样程序 --------------------------- 本数据集包括在DELS1994-2014数据集中记录了调查响应的所有家庭。 数据收集方式 --------------------------- 面对面 [f2f] 清洗操作 --------------------------- 本数据集是通过使用delprocess python包中的数据加工功能(www.github.com/wiebket/delprocess: release v1.0)从DELS 1994-2014数据集中构建的。delprocess python包考虑了原始DELS 1994-2014数据集的复杂性,并利用‘spec文件’来指定必须执行的数据加工步骤。要检索所有调查年份的数据,需要两个单独的spec文件来处理1994-1999年和2000-2014年的调查响应。用于生成本数据集的spec文件包含在程序文件中,并可作为创建新定制数据集的模板。有关如何使用这些spec文件处理数据的完整说明包含在delprocess包中的README文件中。 SPEC文件指定以下加工步骤: 1. 搜索将搜索的调查问题以及返回的变量列表 2. 从搜索输出中检索到的变量的转换(加法、减法、乘法) 3. 变量的分箱间隔(需要数值数据) 4. 分箱的标签(需要分箱数据) 5. 分箱段详情 6. 编码变量值的替换 7. 更高级别的地理详情 特别地,DELSKV 1994-2014数据集是通过指定以下加工步骤生成的: 转换 * 1994-1999年的月收入是通过‘income’搜索词返回的变量 * 2000-2014年的月收入是‘earn per month’、‘money from small business’和‘external’搜索词返回的变量的总和 * 1994-1999年的电器数量是电器的计数(未收集损坏电器的数据) * 2000-2014年的电器数量是电器的计数[减去]损坏电器的计数(电视除外,电视没有关于损坏电器的信息) * 通过对所有16岁以上的男性女性住户数量的总和创建了一个新的total_adults变量 * 通过对所有16岁以下的男性女性住户数量的总和创建了一个新的total_children变量 * 通过对所有16岁以上的男性女性养老金领取者数量的总和创建了一个新的total_pensioners变量 * 通过对所有16岁以上的男性女性失业住户数量的总和创建了一个新的total_unemployed变量 * 通过对所有16岁以上的男性女性兼职就业住户数量的总和创建了一个新的total_part_time变量 * 1994-1999年的roof_material和wall_material值增加了1 * 1994-1999年的water_access被转换为4[减去]‘watersource’值 替换 * 电器使用值已替换为: 0=从未 1=每月 2=每周 3=每日 * water_access值已替换为: 1=附近的河流/水库/井 2=街区/街道水龙头 3=院子内的水龙头 4=屋内的水龙头 * roof_material和wall_material值已替换为: 1=IBR/Corr.Iron/Zinc 2=Thatch/Grass 3=Wood/Masonite board 4=Brick 5=Block 6=Plaster 7=Concrete 8=Tiles 9=Plastic 10=Asbestos 11=Daub/Mud/Clay 其他注意事项 电器使用信息仅在2000年后收集。 本数据集未对调查响应进行分箱。 数据评估 --------------------------- * 2000-2014年的调查问题中没有包含‘50岁以上女性人数’的变量,这与其他住户年龄类别模式相悖。 * 原始问题的间距不规则,可能在指定转换时造成挑战(例如,‘16-24岁男性人数’和‘25-34岁男性人数’,‘兼职’和‘parttime’)。 * 原始问题中的拼写错误可能在指定转换时造成挑战(例如,‘主管兼职’)。 缺失值 缺失值未进行替换,并以空白表示,除了推算列(total_adults、total_children等)和2000年后的电器,其中缺失值已用0替换。
提供机构:
www.datafirsttest.uct.ac.za
二维码
社区交流群
二维码
科研交流群
商业服务