October Household Survey 1996 - South Africa
收藏www.datafirst.uct.ac.za2025-01-21 收录
下载链接:
https://www.datafirst.uct.ac.za/dataportal/index.php/catalog/411
下载链接
链接失效反馈官方服务:
资源简介:
Abstract
---------------------------
During October 1996 Statistics South Africa recorded the details of people living in more than nine million households in South Africa, as well as those in hostels, hotels and prisons. Census 1996 was the first nation wide census since the splitting up of the country under apartheid after 1970 and sought to apply the same methodology to everyone: visiting the household, and obtaining details about all its members from a representative who was either interviewed, or else filled in the questionnaire in their language of choice.
Geographic coverage
---------------------------
The survey had national coverage
Analysis unit
---------------------------
Households and individuals
Universe
---------------------------
The survey covered households and household members in households in the nine provinces of South Africa.
Kind of data
---------------------------
Sample survey data
Sampling procedure
---------------------------
A sample of 1600 Enumerator Areas (EA's) was produced in conjunction with the sample for the 1996 Population Census post-enumeration survey. A two stage sampling procedure was applied in the following manner.
The first stratification was done by province, as well as by type of EA (formal or informal urban areas, commercial farms, traditional authority areas or other non-urban areas). Originally eight hundred EA's were allocated to each strata by province proportionately. Later some adjustments were made to ensure adequate representation of smaller provinces such as the Northern Cape. Independent systematic samples of EA's were drawn for each stratum within each province. The sampling frame that was used was constructed from the preliminary database of EA's which was established during the demarcation and listing phase of the 1996 population census. In the second phase 10 households were drawn from each EA on the western and eastern side of the EA drawn for the post enumeration survey. This meant 10 households per EA in 1600 different EA's, that is 16 000 households in total.
Mode of data collection
---------------------------
Face-to-face [f2f]
Research instrument
---------------------------
The data files in the October Household Survey 1996 (OHS 1996) correspond to the following sections in the questionnaire:
House: Data from FLAP, Section 1 and Section 7
Person: Data from Section 2
Worker: Data from Section 3
Migrant: Data from Section 4
Death: Data from Section 5
Births: Data from Section 6 - This data had a considerable number of problems and will not be published.
Income: Data from Section 7 (included in House)
Domestic: Data from Section 8
Data appraisal
---------------------------
Questionnaire:
The October Household Survey 1996 questionnaire had incorrect FLAP data. No Population Group question was indicated on the FLAP. DataFirst notified Statistics SA who supplied a corrected questionnaire which is the one now available with the dataset.
Household IDs:
In the previous version of the 1996 October Household Survey dataset archived by DataFirst the HHID were not unique. This was corrected in the first version disseminated by DataFirst, version 1. Version 1.1 keeps this correction, but data users should check versions not obtained from DataFirst and replace these with the latest version available from DataFirst.
Linking Files:
The Metadata for the OHS 1996 provides an explanation for merging the files in the files in the OHS 1996 dataset: "The data from different files can be linked on the basis of the record identifiers. The record identifiers are composed of the first few fields in each file. Each record contains the three fields Magisterial District, Enumeration area, and Visiting point number. These eleven digits together constitute a unique household identifier. All records with a given household identifier, no matter which file they are in, belong to the same household. For individuals, a further two digits constituting the Person number, when added to the household identifier, creates a unique individual identifier. Again, these can be used to link records from the PERSON and WORK files. The syntax needed to merge information from different files will differ according to the statistical package used (October Household Survey 1996: Metadata: General Notes: 2).”
According to the above, to generate household IDs it is necessary to use a combination of magisterial district number (mdnumber), enumeration area number (eanumber) and visiting point number (vpnumber). To generate person IDs it is necessary to use the above with the person number (personnu).
These variables are named as such in the OHS 1996 House, OHS 1996 Births, OHS 1996 Migrant, OHS 1996 Deaths, OHS 1996 Household Income Other, OHS 1996 Other, OHS 1996 Domestic and OHS 1996 Flap data files. However, in the OHS 1996 Worker and OHS 1996 Person data files the variable for magisterial district number is “distr”, the variable for Enumeration Area is “ea” and the variable for visiting point number is called "visp”. The variable for person number in these files is called “respno”.
The metadata provided to DataFirst with this dataset does not discuss these changes.
October Household Survey 1996 Births file:
Births data was collected by Section 6 of the OHS 1996 questionnaire, completed for all women younger than 55 years who had ever given birth. The metadata for this survey from Statistics SA states that “This data had a considerable number of problems and will not be published” The dataset provided by DataFirst therefore does not include the original “births” file. Those in possession of this file from unofficial versions of the dataset should note the following problems with the data in the OHS 1996 births file:
Variable name: eegender
Question 6.2: Is/was (the child) a boy or a girl?
Valid range: 1 (boy) - 2 (girl)
Data quality issue: There is a third response value of 0 with no description
Variable name: livinghh
Question 6.4: If alive: Is (the child) currently living with this household?
Valid range: 1 (yes) - 2 (no)
Data quality issue: This variable has an additional response value (0), which has no description
Variable name: agealive
Question 6.5: If alive: How old is he/she?
This question was asked of all women younger than 55 years who have ever given birth to provide the age of their living children.
Data quality issue: responses range from 0-77 for age of child (assuming age 99 is for missing responses) which is outside the plausible range.
Variable name: agenaliv
Question 6.6: If dead: How old was (the child) when he/she died?
Data quality issue: The format of the age at death variable is not clear
Variable name: datebirt
Question 6.7: [All children]: In what year and month was (the child) born?
Data quality issue: There are problems with the format of the date of birth variable
Variable name: wherebor
Question 6.8: [All children]: Where was (the child) born?
Data quality issue: There are only three options for the place of birth in the questionnaire (in a hospital, in a clinic and elsewhere), but the data has 10 response values (0-9) with no explanation for this in the metadata.
Variable name: regstere
Question 6.9 [All children] Was the birth registered?
Valid range: 1(yes) - 2 (no)
Data quality issue: There are 4 response values (0-3) for this variable
摘要
---------------------------
在1996年10月,南非统计局记录了超过九百万户家庭以及宿舍、酒店和监狱中居住者的详细信息。1996年人口普查是自1970年种族隔离政策结束后该国首次全国性普查,旨在对所有人采用相同的方法:访问家庭,并从被采访者或在其选择的语言中填写问卷的代表那里获取所有家庭成员的详细信息。
地理覆盖范围
---------------------------
该调查覆盖全国。
分析单元
---------------------------
家庭和个人
调查范围
---------------------------
该调查涵盖了南非九个省份的家庭及其家庭成员。
数据类型
---------------------------
样本调查数据
抽样程序
---------------------------
在1996年人口普查后的抽样调查样本中,与1600个人口普查区(EA)的样本相结合,采用了以下两阶段抽样程序。
第一阶段按省份以及EA类型(正式或非正式城市地区、商业农场、传统权威地区或其他非城市地区)进行分层。最初,每个省份按比例分配了800个EA。后来,对一些省份进行了调整,以确保对北开普等较小省份的充分代表。在每个省份内,对每个分层内的EA进行了独立的系统抽样。所使用的抽样框架是从1996年人口普查划分和登记阶段的EA初步数据库中构建的。在第二阶段,从为后人口普查调查抽取的EA的西边和东边各抽取了10户家庭。这意味着在1600个不同的EA中,每个EA有10户家庭,即总共16000户。
数据收集方式
---------------------------
面对面 [f2f]
研究工具
---------------------------
1996年10月家庭调查(OHS 1996)的数据文件与问卷中的以下部分相对应:
家庭:来自FLAP第1部分和第7部分的数据
个人:来自第2部分的数据
工人:来自第3部分的数据
移民:来自第4部分的数据
死亡:来自第5部分的数据
出生:来自第6部分的数据 - 该数据存在相当多的问题,将不予发布。
收入:来自第7部分(包含在家庭中)
家庭内部:来自第8部分
数据评估
---------------------------
问卷:1996年10月家庭调查问卷存在FLAP数据错误。FLAP上未标注人口群体问题。DataFirst通知了南非统计局,统计局提供了更正后的问卷,即现在与数据集一起提供的问卷。
家庭ID:在DataFirst存档的1996年10月家庭调查数据集的前一个版本中,HHID并非唯一。DataFirst在发布的第一个版本中对此进行了纠正,版本1。版本1.1保留了这一纠正,但数据用户应检查非DataFirst获取的版本,并用从DataFirst获取的最新版本替换。
链接文件:OHS 1996的元数据提供了关于合并OHS 1996数据集中文件的说明:“基于记录标识符,可以基于不同文件中的数据建立链接。记录标识符由每个文件中的前几个字段组成。每个记录包含三个字段:司法管辖区、人口普查区和访问点编号。这11位数字共同构成了一个独特的家庭标识符。所有具有给定家庭标识符的记录,无论它们位于哪个文件中,都属于同一家庭。对于个人,将构成个人编号的额外两位数字添加到家庭标识符中,可以创建一个独特的个人标识符。同样,这些可以用来链接来自PERSON和WORK文件的记录。合并来自不同文件信息所需的语法将根据所使用的统计软件包而有所不同(October Household Survey 1996:元数据:一般说明:2)。”
根据上述内容,生成家庭ID需要使用司法管辖区编号(mdnumber)、人口普查区编号(eanumber)和访问点编号(vpnumber)的组合。生成个人ID需要使用上述组合加上个人编号(personnu)。
这些变量在OHS 1996家庭、OHS 1996出生、OHS 1996移民、OHS 1996死亡、OHS 1996家庭收入其他、OHS 1996其他、OHS 1996家庭内部和OHS 1996 FLAP数据文件中命名为此类。然而,在OHS 1996工人和OHS 1996个人数据文件中,司法管辖区编号的变量名为“distr”,人口普查区的变量名为“ea”,访问点编号的变量名为“visp”。这些文件中个人编号的变量名为“respno”。
与该数据集一起提供的元数据没有讨论这些变化。
1996年10月家庭调查出生文件:出生数据是通过OHS 1996问卷的第6部分收集的,为所有55岁以下曾经生育的女性填写。南非统计局提供的该调查的元数据表明:“该数据存在相当多的问题,将不予发布。”因此,DataFirst提供的数据集不包括原始的“出生”文件。拥有此文件的非官方版本的人应注意到OHS 1996出生文件中的以下数据问题:
变量名称:eegender
问题6.2:(孩子)是男孩还是女孩?
有效范围:1(男孩)- 2(女孩)
数据质量问题:存在第三个响应值0,没有描述。
变量名称:livinghh
问题6.4:如果还活着:孩子是否目前与这个家庭住在一起?
有效范围:1(是)- 2(否)
数据质量问题:此变量有额外的响应值(0),没有描述。
变量名称:agealive
问题6.5:如果还活着:他/她多大了?
这个问题是针对所有55岁以下曾经生育的女性提出的,以提供他们活着孩子的年龄。
数据质量问题:对儿童年龄的响应范围从0-77岁(假设99岁为缺失响应),超出了可能的范围。
变量名称:agenaliv
问题6.6:如果已故:孩子去世时多大了?
数据质量问题:死亡年龄变量的格式不明确。
变量名称:datebirt
问题6.7:[所有孩子]孩子在哪一年和哪个月出生?
数据质量问题:出生日期变量的格式存在问题。
变量名称:wherebor
问题6.8:[所有孩子]孩子在哪里出生?
数据质量问题:问卷中只有三个出生地点选项(在医院、在诊所和其他地方),但数据有10个响应值(0-9),在元数据中没有解释。
变量名称:regstere
问题6.9 [所有孩子]出生是否已注册?
有效范围:1(是)- 2(否)
数据质量问题:此变量有4个响应值(0-3)。
提供机构:
www.datafirst.uct.ac.za



