Project for Statistics on Living Standards and Development 1993 - South Africa
收藏microdata.worldbank.org2017-09-08 更新2025-01-21 收录
下载链接:
https://microdata.worldbank.org/index.php/catalog/902
下载链接
链接失效反馈官方服务:
资源简介:
Abstract
---------------------------
The Project for Statistics on Living standards and Development was a coutrywide World Bank Living Standards Measurement Survey. It covered approximately 9000 households, drawn from a representative sample of South African households. The fieldwork was undertaken during the nine months leading up to the country's first democratic elections at the end of April 1994. The purpose of the survey was to collect statistical information about the conditions under which South Africans live in order to provide policymakers with the data necessary for planning strategies. This data would aid the implementation of goals such as those outlined in the Government of National Unity's Reconstruction and Development Programme.
Geographic coverage
---------------------------
National coverage
Analysis unit
---------------------------
- Households
- Individuals
- Community
Universe
---------------------------
All Household members.
Individuals in hospitals, old age homes, hotels and hostels of educational institutions were not included in the sample. Migrant labour hostels were included. In addition to those that turned up in the selected ESDs, a sample of three hostels was chosen from a national list provided by the Human Sciences Research Council and within each of these hostels a representative sample was drawn on a similar basis as described above for the households in ESDs.
Kind of data
---------------------------
Sample survey data [ssd]
Sampling procedure
---------------------------
Sample size is 9,000 households
The sample design adopted for the study was a two-stage self-weightingdesign in which the first stage units were Census Enumerator Subdistricts (ESDs, or their equivalent) and the second stage were households.
The advantage of using such a design is that it provides a representative sample that need not be based on accurate census population distribution.in the case of South Africa, the sample will automatically include many poor people, without the need to go beyond this and oversample the poor. Proportionate sampling as in such a self-weighting sample design offers the simplest possible data files for further analysis, as weights do not have to be added. However, in the end this advantage could not be retained and weights had to be added.
The sampling frame was drawn up on the basis of small, clearly demarcated area units, each with a population estimate. The nature of the self-weighting procedure adopted ensured that this population estimate was not important for determining the final sample, however. For most of the country, census ESDs were used. Where some ESDs comprised relatively large populations as for instance in some black townships such as Soweto, aerial photographs were used to divide the areas into blocks of approximately equal population size. In other instances, particularly in some of the former homelands, the area units were not ESDs but villages or village groups.
In the sample design chosen, the area stage units (generally ESDs) were selected with probability proportional to size, based on the census population. Systematic sampling was used throughout that is, sampling at fixed interval in a list of ESDs, starting at a randomly selected starting point. Given that sampling was self-weighting, the impact of stratification was expected to be modest. The main objective was to ensure that the racial and geographic breakdown approximated the national population distribution. This was done by listing the area stage units (ESDs) by statistical region and then within the statistical region by urban or rural. Within these sub-statistical regions, the ESDs were then listed in order of percentage African. The sampling interval for the selection of the ESDs was obtained by dividing the 1991 census population of 38,120,853 by the 300 clusters to be selected. This yielded 105,800. Starting at a randomly selected point, every 105,800th person down the cluster list was selected. This ensured both geographic and racial diversity (ESDs were ordered by statistical sub-region and proportion of the population African). In three or four instances, the ESD chosen was judged inaccessible and replaced with a similar one.
In the second sampling stage the unit of analysis was the household. In each selected ESD a listing or enumeration of households was carried out by means of a field operation. From the households listed in an ESD a sample of households was selected by systematic sampling. Even though the ultimate enumeration unit was the household, in most cases "stands" were used as enumeration units. However, when a stand was chosen as the enumeration unit all households on that stand had to be interviewed.
Census population data, however, was available only for 1991. An assumption on population growth was thus made to obtain an approximation of the population size for 1993, the year of the survey. The sampling interval at the level of the household was determined in the following way: Based on the decision to have a take of 125 individuals on average per cluster (i.e. assuming 5 members per household to give an average cluster size of 25 households), the interval of households to be selected was determined as the census population divided by 118.1, i.e. allowing for population growth since the census. It was subsequently discovered that population growth was slightly over-estimated but this had little effect on the findings of the survey.
Individuals in hospitals, old age homes, hotels and hostels of educational institutions were not included in the sample. Migrant labour hostels were included. In addition to those that turned up in the selected ESDs, a sample of three hostels was chosen from a national list provided by the Human Sciences Research Council and within each of these hostels a representative sample was drawn on a similar basis as described abovefor the households in ESDs.
Mode of data collection
---------------------------
Face-to-face [f2f]
Research instrument
---------------------------
The main instrument used in the survey was a comprehensive household questionnaire. This questionnaire covered a wide range of topics but was not intended to provide exhaustive coverage of any single subject. In other words, it was an integrated questionnaire aimed at capturing different aspects of living standards. The topics covered included demography, household services, household expenditure, educational status and expenditure, remittances and marital maintenance, land access and use, employment and income, health status and expenditure and anthropometry (children under the age of six were weighed and their heights measured). This questionnaire was available to households in two languages, namely English and Afrikaans. In addition, interviewers had in their possession a translation in the dominant African language/s of the region.
In addition to the detailed household questionnaire referred to above, a community questionnaire was administered in each cluster of the sample. The purpose of this questionnaire was to elicit information on the facilities available to the community in each cluster. Questions related primarily to the provision of education, health and recreational facilities. Furthermore there was a detailed section for the prices of a range of commodities from two retail sources in or near the cluster: a formal source such as a supermarket and a less formal one such as the "corner cafe" or a "spaza". The purpose of this latter section was to obtain a measure of regional price variation both by region and by retail source. These prices were obtained by the interviewer. For the questions relating to the provision of facilities, respondents were "prominent" members of the community such as school principals, priests and chiefs.
Cleaning operations
---------------------------
All the questionnaires were checked when received. Where information was incomplete or appeared contradictory, the questionnaire was sent back to the relevant survey organization. As soon as the data was available, it was captured using local development platform ADE. This was completed in February 1994. Following this, a series of exploratory programs were written to highlight inconsistencies and outlier. For example, all person level files were linked together to ensure that the same person code reported in different sections of the questionnaire corresponded to the same person. The error reports from these programs were compared to the questionnaires and the necessary alterations made. This was a lengthy process, as several files were checked more than once, and completed at the beginning of August 1994. In some cases questionnaires would contain missing values, or comments that the respondent did not know, or refused to answer a question.
These responses are coded in the data files with the following values: VALUE MEANING
-1 : The data was not available on the questionnaire or form
-2 : The field is not applicable
-3 : Respondent refused to answer
-4 : Respondent did not know answer to question
Data appraisal
---------------------------
The data collected in clusters 217 and 218 should be viewed as highly unreliable and therefore removed from the data set. The data currently available on the web site has been revised to remove the data from these clusters. Researchers who have downloaded the data in the past should revise their data sets. For information on the data in those clusters, contact SALDRU http://www.saldru.uct.ac.za/.
摘要
---------------------------
统计生活水平与发展项目是一项由世界银行主导的国家层面的生活水平测量调查。该调查覆盖了大约9000个家庭,这些家庭来自南非家庭的代表性样本。实地调查在1994年4月底该国首次民主选举前的九个月内进行。调查的目的是收集关于南非人民生活状况的统计数据,为政策制定者提供制定策略所需的数据。这些数据将有助于实现政府国家统一重建与发展计划中所概述的目标。
地理覆盖范围
---------------------------
全国覆盖
分析单元
---------------------------
- 家庭
- 个人
- 社区
总体
---------------------------
所有家庭成员。
不包括医院、养老院、教育机构宿舍和旅馆的个体,但包括流动劳工宿舍。除了在选定的ESD中出现的外,还从人类科学研究会提供的国家名单中选择三个宿舍的样本,并在每个宿舍中根据上述ESD家庭的方式抽取代表性样本。
数据类型
---------------------------
样本调查数据 [ssd]
抽样程序
---------------------------
样本量:9,000个家庭
本研究采用的抽样设计为两阶段自加权设计,其中第一阶段单位为人口普查编目子区(ESD,或其等效单位),第二阶段为家庭。
采用此类设计的优点是,它提供了一个代表性样本,无需基于准确的人口普查人口分布。在南非的情况下,样本将自动包括许多贫困人口,无需超出此范围并对贫困人口进行过度抽样。与这种自加权样本设计中的比例抽样一样,提供了最简单的数据文件以供进一步分析,因为无需添加权重。然而,最终这种优势未能保留,并不得不添加权重。
抽样框架是根据具有人口估计的小型、明确划分的区域单位制定的。采用的自加权程序的性质确保了这种人口估计对于确定最终样本并不重要。对于该国的大部分地区,使用了人口普查ESD。在有些ESD包含相对较大的人口的情况下,例如一些黑人城镇如索韦托,使用了航空照片将区域划分为大致相等的人口规模的地块。在其他情况下,特别是在一些前保留地中,区域单位不是ESD,而是村庄或村庄群体。
在选定的抽样设计中,区域阶段单位(通常是ESD)是根据人口普查人口的比例进行选择的。在整个过程中使用了系统抽样,即在ESD列表中按固定间隔进行抽样,从随机选定的起始点开始。鉴于抽样是自加权的,预计分层的影响将是适度的。主要目标是确保种族和地理分布近似于国家人口分布。这是通过按统计区域列出区域阶段单位(ESD),然后在统计区域内按城市或农村列出实现的。在这些次统计区域内,然后按非洲人口百分比顺序列出ESD。选择ESD的抽样间隔是通过将1991年人口普查的38,120,853人口除以要选择的300个集群得出的,即105,800。从随机选定的点开始,每105,800个人中选出一个人。这确保了地理和种族的多样性(ESD按统计子区域和非洲人口比例排序)。在三个或四个实例中,选定的ESD被认为不可访问,并被类似的ESD所取代。
在第二阶段的抽样中,分析单位是家庭。在每个选定的ESD中,通过实地操作进行家庭名单或清点。从ESD中列出的家庭中,通过系统抽样选择家庭样本。尽管最终枚举单位是家庭,但在大多数情况下使用了“地块”作为枚举单位。然而,当选择地块作为枚举单位时,该地块上的所有家庭都必须接受调查。
然而,人口普查人口数据仅适用于1991年。因此,对人口增长进行了假设,以获得调查年份1993年的人口规模的近似值。确定家庭层面的抽样间隔的方法如下:基于每个集群平均有125个人的决定(即假设每个家庭有5名成员,以给出平均集群规模为25户),确定要选择的家庭抽样间隔为人口普查人口除以118.1,即考虑到自人口普查以来的人口增长。后来发现,人口增长略有高估,但这几乎没有影响调查结果。
医院、养老院、旅馆和教育机构宿舍的个体不包括在样本中。包括流动劳工宿舍。除了在选定的ESD中出现的外,还从人类科学研究会提供的国家名单中选择三个宿舍的样本,并在每个宿舍中根据上述ESD家庭的方式抽取代表性样本。
数据收集方式
---------------------------
面对面 [f2f]
研究工具
---------------------------
调查中使用的主体工具是一份全面的家庭问卷。这份问卷涵盖了广泛的主题,但并非旨在对任何单一主题提供详尽覆盖。换句话说,它是一份旨在捕捉生活水平不同方面的综合问卷。涵盖的主题包括人口统计学、家庭服务、家庭支出、教育状况和支出、汇款和婚姻赡养、土地获取和使用、就业和收入、健康状况和支出以及人体测量学(六岁以下的儿童称重并测量身高)。这份问卷以英语和南非荷兰语两种语言提供给家庭。此外,调查员还拥有该地区主导的非洲语言的翻译。
除了上述详细的家庭问卷外,还在样本的每个集群中进行了社区问卷。该问卷的目的是收集有关每个集群社区可获得的设施的信息。问题主要涉及教育、健康和娱乐设施的建设。此外,还有一个关于两个零售来源(集群内或附近)的各种商品价格的详细部分:一个正式来源,如超市,和一个较不正式的来源,如“角落咖啡馆”或“斯帕扎”。此部分的目的在于获得区域价格差异的度量,无论是按区域还是按零售来源。这些价格由调查员获得。对于有关设施提供的问题,受访者是社区中的“显要”成员,如学校校长、牧师和首领。
数据清理操作
---------------------------
所有问卷在收到时都进行了检查。如果信息不完整或似乎矛盾,则将问卷退回相关调查组织。一旦数据可用,就使用当地开发平台ADE进行捕获。这完成于1994年2月。在此之后,编写了一系列探索性程序来突出不一致和异常值。例如,将所有个人级别文件链接在一起,以确保在问卷的不同部分中报告的相同个人代码对应于同一个人。将这些程序的错误报告与问卷进行比较,并做出必要的更改。这是一个漫长的过程,因为几个文件被检查了不止一次,并在1994年8月初完成。在某些情况下,问卷可能包含缺失值,或受访者表示不知道或拒绝回答问题的注释。
这些响应在数据文件中以以下值进行编码:值 意义
-1:问卷或表格上没有数据
-2:该字段不适用
-3:受访者拒绝回答
-4:受访者不知道问题的答案
数据评估
---------------------------
应将集群217和218中收集的数据视为高度不可靠,因此应从数据集中删除。网站上的现有数据已修订,以删除这些集群的数据。过去下载过数据的研究人员应修订他们的数据集。有关这些集群的数据信息,请联系SALDRU http://www.saldru.uct.ac.za/。
提供机构:
microdata.worldbank.org



