Russia Longitudinal Monitoring Survey - Higher School of Economics 2003 - Russian Federation
收藏catalog.ihsn.org2019-03-29 更新2025-03-21 收录
下载链接:
http://catalog.ihsn.org/catalog/6199
下载链接
链接失效反馈官方服务:
资源简介:
Abstract
---------------------------
The Russia Longitudinal Monitoring Survey (RLMS) is a household-based survey designed to measure the effects of Russian reforms on the economic well-being of households and individuals. In particular, determining the impact of reforms on household consumption and individual health is essential, as most of the subsidies provided to protect food production and health care have been or will be reduced, eliminated, or at least dramatically changed. These effects are measured by a variety of means: detailed monitoring of individuals' health status and dietary intake, precise measurement of household-level expenditures and service utilization, and collection of relevant community-level data, including region-specific prices and community infrastructure data. Data have been collected since 1992.
Geographic coverage
---------------------------
National
Analysis unit
---------------------------
Households and individuals.
Kind of data
---------------------------
Sample survey data [ssd]
Sampling procedure
---------------------------
In Phase II (Rounds V - XX) of the RLMS, a multi-stage probability sample was employed. Please refer to the March 1997 review of the Phase II sample. First, a list of 2,029 consolidated regions was created to serve as PSUs. These were allocated into 38 strata based largely on geographical factors and level of urbanization but also based on ethnicity where there was salient variability. As in many national surveys involving face-to-face interviews, some remote areas were eliminated to contain costs; also, Chechnya was eliminated because of armed conflict. From among the remaining 1,850 regions (containing 95.6 percent of the population), three very large population units were selected with certainty: Moscow city, Moscow Oblast, and St. Petersburg city constituted self-representing (SR) strata. The remaining non-self-representing regions (NSR) were allocated to 35 equal-sized strata. One region was then selected from each NSR stratum using the method "probability proportional to size" (PPS). That is, the probability that a region in a given NSR stratum was selected was directly proportional to its measure of population size.
The NSR strata were designed to have approximately equal sizes to improve the efficiency of estimates. The target population (omitting the deliberate exclusions described above) totaled over 140 million inhabitants. Ideally, one would use the population of eligible households, not the population of individuals. As is often the case, we were obliged to use figures on the population of individuals as a surrogate because of the unavailability of household figures in various regions.
Although the target sample size was set at 4,000, the number of households drawn into the sample was inflated to 4,718 to allow for a nonresponse rate of approximately 15 percent. The number of households drawn from each of the NSR strata was approximately equal (averaging 108), since the strata were of approximately equal size and PPS was employed to draw the PSUs in each one. However, because response rates were expected to be higher in urban areas than in rural areas, the extent of over-sampling varied. This variation accounted for the differences in households drawn across the NSR PSUs. It also accounted for the fact that 940 households were drawn in the three SR strata--more than the 14.6 percent (i.e. 689) that would have been allotted based on strict proportionality.
Since there was no consolidated list of households or dwellings in any of the 38 selected PSUs, an intermediate stage of selection was then introduced, as usual. Professional samplers will recognize that this is actually the first stage of selection in the three SR strata, since those units were selected with certainty. That is, technically, in Moscow, St. Petersburg, and Moscow oblast, the census enumeration districts were the PSUs. However, it was cumbersome to keep making this distinction throughout the description, and researchers followed the normal practice of using the terms "PSU" and "SSU" loosely. Needless to say, in the calculation of design effects, where the distinction is critical, the proper distinction was maintained. The selection of second-stage units (SSUs) differed depending on whether the population was urban (located in cities and "villages of the city type," known as "PGTs") or rural (located in villages). That is, within each selected PSU the population was stratified into urban and rural substrata, and the target sample size was allocated proportionately to the two substrata. For example, if 40 percent of the population in a given region was rural, 40 of the 100 households allotted to the stratum were drawn from villages.
In rural areas of the selected PSUs, a list of all villages was compiled to serve as SSUs. The list was ordered by size and (where salient) by ethnic composition. PPS was employed to select one village for each 10 households allocated to the rural substratum. Again, under the standard principles of PPS, once the required number of villages was selected, an equal number of households in the sample (10) were allocated to each village. Since villages maintain very reliable lists of households, in each selected village the 10 households were selected systematically from the household list. In a few cases, villages were judged to be too small to sustain independent interviews with 10 households; in such cases, three or four tiny villages were treated as a single SSU for sampling purposes.
In urban areas, SSUs were defined by the boundaries of 1989 census enumeration districts, if possible. If the necessary information was not available, 1994 microcensus enumeration districts, voting districts, or residential postal zones were employed--in decreasing order of preference. Since census enumeration districts were originally designed to be roughly equal in population size, one district was selected systematically without using PPS for each 10 households required in the sample. In the few cases where postal zones were used, one zone was likewise selected systematically for each 10 households. However, where voting districts were used, to compensate for the marked variation in population size, PPS was employed to select one voting district for each 10 households required in the urban sub-stratum.
In both urban and rural substrata, interviewers were required to visit each selected dwelling up to three times to secure the interviews. They were not allowed to make substitutions of any sort. The interviewers' first task was to identify households at the designated dwellings. "Household" was defined as a group of people who live together in a given domicile, and who share common income and expenditures. Households were also defined to include unmarried children, 18 years of age or younger, who were temporarily residing outside the domicile at the time of the survey. If perchance the interviewer identified more than one household in the dwelling, he or she was obliged to select one using a procedure outlined in the technical report. The interviewer then administered a household questionnaire to the most knowledgeable and willing member of the household.
The interviewer then conducted interviews with as many adults as possible, acquiring data about their individual activities and health. Data for the children's questionnaires were obtained from adults in the household. By virtue of the fact that an attempt was made to obtain individual questionnaires for all members of households, the sample constitutes a proper probability sample of individuals as well as of households, without any special weighting. Actually, the fact that we did not interview unmarried minors living temporarily outside the domicile slightly diminished the representativeness of the sample of individuals in that age group.
The multivariate distribution of the sample by sex, age, and urban-rural location compared quite well with the corresponding multivariate distribution of the 1989 census. Of course, because of random sampling error and changes in the distribution since the 1989 census, we did not expect perfect correspondence. Nevertheless, there was usually a difference of only one percentage point or less between the two distributions.
Another way to evaluate the adequacy (or efficiency) of the sample was to examine design effects. An important factor in determining the precision of estimates in multi-stage samples was the mean ultimate cluster (PSU) size. All else being equal, the larger the size the less precise the measure is. In Rounds I through IV of the RLMS, the average cluster size approached 360--a large number dictated by constraints imposed by our collaborators. Thus, although the sample size covered around 6,000 households, precision was less than we would have liked for a sample of that size. In Rounds I and III of the RLMS, the 95 percent confidence interval for household income was about ?±13 percent.
In the Phase II (Rounds V - XX) sample, the situation was considerably better. Although there were only 4,000 households, the mean size of clusters was much smaller than in Phase I. There were 35 PSUs with about 100 households each; even this result was an improvement over the average of 360 in the design of the RLMS Rounds I through IV. However, in the three self-representing areas, the respondents were drawn from 61 PSUs. Recall that Moscow city and oblast, as well as St. Petersburg city, were not sampled but were chosen with certainty. Therefore, the first stage of selection in them was the selection of census enumeration districts. Thus the mean cluster size in the entire sample was about 42, i.e., 4,000/(35+61). Given these much smaller cluster sizes, researchers had reason to expect that precision in this survey would be as good as it was in Rounds I through IV despite the smaller sample size, and this expectation, in fact, turned out to be the case in Rounds V through XIII.
Mode of data collection
---------------------------
Face-to-face [f2f]
Research instrument
---------------------------
The questionnaire are English-language translations of the original Russian questionnaires. The English versions have been translated as literally as possible. The order of the questions and the layout of the pages have been preserved in the English versions.
The questionnaires are also designed to function as codebooks. The variable names, as they appear in the data sets, are usually listed below or to the left of the questions. If the abbreviation (char) appears with a variable name, then the responses to that question are stored in a character variable. If there is no variable name associated with a particular question, then the responses to that question do not appear in the data set. Some questions in the questionnaires are color coded. Pink means that the question was added. Green indicates changes from the previous round (e.g., year). Gray means that the questions were asked, but the data are not available for public use - the questions were added at the request of the Pension Office and are for their use only.
Cleaning operations
---------------------------
In Phase II (Rounds V - XX), when questionnaires were returned to local supervisors, those supervisors were required to examine them to locate problems that could best be remedied in the field, e.g., by returning to get key demographic information or cleaning ID numbers so that the roster of individuals located in the household questionnaire matched those on the individual questionnaires from that household. The questionnaires were then transported to Moscow, where yet another ID check was performed.
In Moscow, coders looked through all questionnaires to code so-called "other: specify" responses. However, open-ended questions (e.g., occupation questions) were not coded at this time. Instead, their texts were fully entered as long string variables. Entering the open-ended answers as character variables offered several advantages. First, it allowed data entry to begin immediately, with no delay for coding. Second, it permited the use of computer programs to assist in coding the string variables. Third, the method allowed any user of the original data sets to recode the character variables to suit his or her purposes without going back to the paper copies of the questionnaires. All data entry was handled in-house using the SPSS data entry program on PCs.
摘要
---------------------------
俄罗斯纵向监测调查(RLMS)是一项基于家庭的调查,旨在衡量俄罗斯改革对家庭和个人经济福祉的影响。特别是,确定改革对家庭消费和个人健康的影响至关重要,因为大部分用于保护食品生产和医疗保健的补贴已经减少、取消或至少发生了显著变化。这些影响通过多种手段进行衡量:详细监测个人的健康状况和饮食习惯,精确测量家庭层面的支出和服务利用,以及收集相关的社区级数据,包括地区特定价格和社区基础设施数据。自1992年以来,已收集了这些数据。
地理覆盖范围
---------------------------
全国。
分析单位
---------------------------
家庭和个人。
数据类型
---------------------------
样本调查数据 [ssd]。
抽样程序
---------------------------
在RLMS的第二阶段(第五轮至第二十轮)中,采用了多阶段概率抽样方法。请参阅1997年3月第二阶段样本的审查。首先,创建了一个包含2,029个综合区域的清单,作为PSU(抽样单元)。这些区域根据地理因素和城市化水平以及民族差异被分配到38个层中。在许多涉及面对面访谈的国家调查中,为了控制成本,一些偏远地区被排除在外;此外,由于武装冲突,车臣被排除在外。在剩余的1,850个地区(包含95.6%的人口)中,选出了三个确定的大人口单位:莫斯科市、莫斯科州和圣彼得堡市构成了自我代表(SR)层。其余的非自我代表地区(NSR)被分配到35个大小相等的层中。然后,使用“按规模比例抽样”(PPS)方法从每个NSR层中选取一个地区。这意味着在给定的NSR层中选取一个地区的概率与该地区人口规模的度量成正比。
NSR层的设计旨在具有大致相等的大小,以提高估计的效率。目标人群(排除上述故意排除的情况)总数超过1.4亿。理想情况下,人们会使用合格家庭的数量,而不是个人的数量。由于各种地区家庭数据的不可用,我们被迫使用个人人口数据作为替代。
尽管目标样本量设定为4,000,但抽取的样本家庭数量增加到4,718,以允许大约15%的非响应率。从每个NSR层中抽取的家庭数量大致相等(平均为108),因为层的大小大致相等,且在每个层中都使用了PPS来抽取PSU。然而,由于预计城市地区的响应率高于农村地区,过抽样程度有所不同。这种差异解释了NSR PSU之间抽取家庭数量的差异。它还解释了为什么在三个SR层中抽取了940个家庭——这比基于严格的比例性分配的14.6%(即689个)要多。
由于在38个选定的PSU中没有任何综合的家庭或住宅清单,因此引入了一个中间选择阶段,这是通常的做法。专业抽样人员会认识到,这在莫斯科、圣彼得堡和莫斯科州实际上是选择的第一阶段,因为这些单位是确定选择的。也就是说,从技术上讲,在莫斯科、圣彼得堡和莫斯科州,普查人口普查区是PSU。然而,在整个描述中保持这种区分是繁琐的,研究人员遵循了正常做法,松散地使用“PSU”和“SSU”等术语。不言而喻,在计算设计效应时,区分至关重要,正确的区分得到了维持。第二阶段单位(SSU)的选择取决于人口是否为城市(位于城市和“城市类型村庄”,称为“PGT”)或农村(位于村庄)。也就是说,在每个选定的PSU内,人口被细分为城市和农村亚层,目标样本量按比例分配到两个亚层。例如,如果某个地区40%的人口是农村的,那么分配给该层的100个家庭中有40个是从村庄抽取的。
在选定的PSU的农村地区,编制了一个包含所有村庄的清单,作为SSU。该清单按规模排序,并在显著的情况下按民族成分排序。对于每个分配到农村亚层的10个家庭,使用PPS选择一个村庄。再次,根据PPS的标准原则,一旦选定了所需的村庄数量,就将等量的样本家庭(10个)分配到每个村庄。由于村庄保持着非常可靠的住户名单,因此在每个选定的村庄中,从住户名单中系统地选择了10个家庭。在少数情况下,村庄被认为太小,无法维持10个家庭的独立访谈;在这种情况下,三个或四个小村庄被作为一个单独的SSU进行抽样。
在城市地区,如果可能的话,SSU由1989年普查人口普查区的边界定义。如果必要的信息不可用,则使用1994年人口普查人口普查区、投票区或住宅邮政区——按优先顺序递减。由于人口普查人口普查区最初是为了大致相等的人口规模而设计的,因此对于每个需要10个家庭的样本,都系统地选择一个地区,而不使用PPS。在少数使用邮政区的情况下,同样系统地选择一个区域,以每个10个家庭为一个单位。然而,在投票区的情况下,为了弥补人口规模明显的变化,对城市亚层使用PPS选择一个投票区。
在城市和农村亚层中,要求访谈员访问每个选定的住宅最多三次以获得访谈。他们不允许进行任何形式的替代。访谈员的首要任务是识别指定住宅中的家庭。‘家庭’被定义为生活在特定住宅内、共享共同收入和支出的群体。家庭还包括当时临时居住在住宅外、年龄在18岁或以下的未婚子女。如果访谈员在住宅中发现了不止一个家庭,他或她必须按照技术报告中概述的程序选择一个。然后,访谈员向家庭中最有知识和意愿的成员发放家庭问卷。
然后,访谈员与尽可能多的成年人进行访谈,获取他们个人活动和健康的数据。儿童问卷的数据是从家庭中的成年人那里获得的。由于试图为家庭中的所有成员获取个人问卷,因此样本构成了家庭样本以及个人样本的适当概率样本,无需任何特殊加权。实际上,由于没有对临时居住在住宅外的未婚未成年人进行访谈,这略微降低了该年龄组个人样本的代表性。
样本按性别、年龄和城乡位置的多变量分布与1989年普查的相应多变量分布相当吻合。当然,由于随机抽样误差和自1989年普查以来的分布变化,我们并不期望两者完全一致。然而,两个分布之间的差异通常只有一个百分点或更少。
评估样本充分性(或效率)的另一种方法是检查设计效应。在确定多阶段样本估计的精度时,平均最终聚类(PSU)的大小是一个重要因素。在其他条件相同的情况下,规模越大,测量就越不精确。在RLMS的第一轮至第四轮中,平均聚类大小接近360——这是由于我们的合作者施加的约束而导致的较大数字。因此,尽管样本量覆盖了大约6,000个家庭,但精度低于我们希望的那样一个样本量的精度。在RLMS的第一轮和第三轮中,家庭收入的95%置信区间约为±13%。
在第二阶段(第五轮至第二十轮)样本中,情况有所改善。尽管只有4,000个家庭,但聚类的大小远小于第一阶段。有35个PSU,每个约有100个家庭;即使这个结果也比RLMS第一轮至第四轮的平均值360有所改善。然而,在三个自我代表地区,受访者是从61个PSU中抽取的。回想一下,莫斯科市和州以及圣彼得堡市没有被抽样,而是确定选择的。因此,这些地区的第一阶段选择是普查人口普查区的选择。因此,整个样本的平均聚类大小约为42,即4,000除以(35+61)。鉴于这些较小的聚类大小,研究人员有理由预期,尽管样本量较小,但这次调查的精度将与第一轮至第四轮一样好,事实上,这种预期在第五轮至第十三轮中得到了证实。
数据收集方式
---------------------------
面对面 [f2f]。
研究工具
---------------------------
问卷是原始俄语问卷的英语翻译。英语版本尽可能地进行了直译。问题顺序和页面布局在英语版本中得到了保留。
问卷还设计成作为代码簿。变量名,如出现在数据集中所示,通常列在问题下方或左侧。如果变量名(char)与变量名一起出现,则该问题的回答存储在字符变量中。如果没有变量名与特定问题相关联,则该问题的回答不会出现在数据集中。问卷中的一些问题被着色编码。粉色表示该问题被添加。绿色表示与前几轮相比有所变化(例如,年份)。灰色表示这些问题已被提出,但数据尚未向公众开放——这些问题是应养老金办公室的要求提出的,仅供其使用。
清洗操作
---------------------------
在第二阶段(第五轮至第二十轮)中,当问卷返回到当地主管时,这些主管必须检查它们以定位可以在现场最好解决的问题,例如,返回以获取关键人口统计数据或清理ID号码,以便家庭问卷中定位的个人名单与该家庭的个人问卷上的名单相匹配。然后,问卷被运送到莫斯科,在那里进行了另一轮ID检查。
在莫斯科,编码员查看了所有问卷以对所谓的“其他:指定”回答进行编码。然而,开放式问题(例如,职业问题)当时并未进行编码。相反,它们的文本被完整地作为长字符串变量输入。将开放式答案作为字符变量输入提供了几个优点。首先,它允许数据输入立即开始,无需等待编码。其次,它允许使用计算机程序帮助编码字符串变量。第三,这种方法允许任何原始数据集的用户根据其目的重新编码字符变量,而无需返回到问卷的纸质副本。所有数据输入都在内部使用PC上的SPSS数据输入程序处理。
提供机构:
catalog.ihsn.org



