five

Counts of Encephalitis lethargica reported in UNITED STATES OF AMERICA: 1923-1932

收藏
DataCite Commons2024-07-09 更新2025-04-16 收录
下载链接:
https://zenodo.org/records/11452283
下载链接
链接失效反馈
官方服务:
资源简介:
Project Tycho datasets contain case counts for reported disease conditions for countries around the world. The Project Tycho data curation team extracts these case counts from various reputable sources, typically from national or international health authorities, such as the US Centers for Disease Control or the World Health Organization. These original data sources include both open- and restricted-access sources. For restricted-access sources, the Project Tycho team has obtained permission for redistribution from data contributors. All datasets contain case count data that are identical to counts published in the original source and no counts have been modified in any way by the Project Tycho team. The Project Tycho team has pre-processed datasets by adding new variables, such as standard disease and location identifiers, that improve data interpretabilty. We also formatted the data into a standard data format.Each Project Tycho dataset contains case counts for a specific condition (e.g. measles) and for a specific country (e.g. The United States). Case counts are reported per time interval. In addition to case counts, datsets include information about these counts (attributes), such as the location, age group, subpopulation, diagnostic certainty, place of aquisition, and the source from which we extracted case counts. One dataset can include many series of case count time intervals, such as "US measles cases as reported by CDC", or "US measles cases reported by WHO", or "US measles cases that originated abroad", etc.Depending on the intended use of a dataset, we recommend a few data processing steps before analysis:- Analyze missing data: Project Tycho datasets do not inlcude time intervals for which no case count was reported (for many datasets, time series of case counts are incomplete, due to incompleteness of source documents) and users will need to add time intervals for which no count value is available. Project Tycho datasets do include time intervals for which a case count value of zero was reported.- Separate cumulative from non-cumulative time interval series. Case count time series in Project Tycho datasets can be "cumulative" or "fixed-intervals". Cumulative case count time series consist of overlapping case count intervals starting on the same date, but ending on different dates. For example, each interval in a cumulative count time series can start on January 1st, but end on January 7th, 14th, 21st, etc. It is common practice among public health agencies to report cases for cumulative time intervals. Case count series with fixed time intervals consist of mutually exxclusive time intervals that all start and end on different dates and all have identical length (day, week, month, year). Given the different nature of these two types of case count data, we indicated this with an attribute for each count value, named "PartOfCumulativeCountSeries".

Project Tycho数据集包含全球各国报告的疾病状况的病例数(case counts)。Project Tycho数据整理团队(data curation team)从各类权威来源(reputable sources)提取这些病例数,通常来自国家或国际卫生机构,例如美国疾病控制与预防中心(US Centers for Disease Control,CDC)或世界卫生组织(World Health Organization,WHO)。这些原始数据源包括开放访问来源和受限访问来源(open- and restricted-access sources)。对于受限访问来源,Project Tycho团队已从数据贡献者处获得再分发许可(redistribution permission)。所有数据集包含的病例数数据与原始来源发布的计数完全一致,Project Tycho团队未对任何计数进行修改。 Project Tycho团队通过添加新变量(如标准疾病和地点标识符)对数据集进行了预处理(pre-processed),以提升数据的可解释性(interpretability)。我们还将数据格式化为标准数据格式。每个Project Tycho数据集包含特定疾病(如麻疹)和特定国家(如美国)的病例数。病例数按时间间隔(time interval)报告。除病例数外,数据集还包含有关这些计数的属性信息(attributes),例如地点、年龄组、亚人群、诊断确定性(diagnostic certainty)、感染地点(place of acquisition)以及我们提取病例数的来源。一个数据集可包含多个病例数时间间隔序列,例如“CDC报告的美国麻疹病例”、“WHO报告的美国麻疹病例”或“源自国外的美国麻疹病例”等。 根据数据集的预期用途,我们建议在分析前执行以下几个数据处理步骤: - 分析缺失数据(missing data):Project Tycho数据集不包含未报告病例数的时间间隔(对于许多数据集,由于源文档不完整,病例数时间序列存在缺失),用户需添加无可用计数的时间间隔。Project Tycho数据集包含报告病例数为零的时间间隔。 - 区分累积和非累积时间间隔序列。Project Tycho数据集中的病例数时间序列可分为“累积”或“固定间隔”(fixed-intervals)两类。累积病例数时间序列由起始日期相同但结束日期不同的重叠病例数间隔组成。例如,累积计数时间序列中的每个间隔可始于1月1日,但分别结束于1月7日、14日、21日等。公共卫生机构通常采用累积时间间隔报告病例数。固定时间间隔的病例数序列由互斥(mutually exclusive)的时间间隔组成,这些间隔的起始和结束日期均不同,且长度一致(日、周、月、年)。鉴于这两类病例数数据的性质不同,我们通过每个计数值的属性"PartOfCumulativeCountSeries"来标识这一点。
提供机构:
University of Pittsburgh
创建时间:
2017-11-02
二维码
社区交流群
二维码
科研交流群
商业服务