COVID-19 dataset by Our World in Data
收藏www.kaggle.com2020-09-20 更新2025-01-15 收录
下载链接:
https://www.kaggle.com/bolkonsky/covid19
下载链接
链接失效反馈资源简介:
# Data on COVID-19 (coronavirus) by _Our World in Data_
Our complete COVID-19 dataset is a collection of the COVID-19 data maintained by [_Our World in Data_](https://ourworldindata.org/coronavirus). It is updated daily and includes data on confirmed cases, deaths, and testing, as well as other variables of potential interest.
### 🗂️ Download our complete COVID-19 dataset : [CSV](https://covid.ourworldindata.org/data/owid-covid-data.csv) | [XLSX](https://covid.ourworldindata.org/data/owid-covid-data.xlsx) | [JSON](https://covid.ourworldindata.org/data/owid-covid-data.json)
We will continue to publish up-to-date data on confirmed cases, deaths, and testing, throughout the duration of the COVID-19 pandemic.
## Our data sources
- **Confirmed cases and deaths:** our data comes from the [European Centre for Disease Prevention and Control](https://www.ecdc.europa.eu/en/publications-data/download-todays-data-geographic-distribution-covid-19-cases-worldwide) (ECDC). We discuss how and when the ECDC collects and publishes this data [here](https://ourworldindata.org/coronavirus-source-data). The cases & deaths dataset is updated daily. *Note: the number of cases or deaths reported by any institution—including the ECDC, the WHO, Johns Hopkins and others—on a given day does not necessarily represent the actual number on that date. This is because of the long reporting chain that exists between a new case/death and its inclusion in statistics. **This also means that negative values in cases and deaths can sometimes appear when a country sends a correction to the ECDC, because it had previously overestimated the number of cases/deaths. Alternatively, large changes can sometimes (although rarely) be made to a country's entire time series if the ECDC decides (and has access to the necessary data) to correct values retrospectively.***
- **Testing for COVID-19:** this data is collected by the _Our World in Data_ team from official reports; you can find further details in our post on COVID-19 testing, including our [checklist of questions to understand testing data](https://ourworldindata.org/coronavirus-testing#our-checklist-for-covid-19-testing-data), information on [geographical and temporal coverage](https://ourworldindata.org/coronavirus-testing#which-countries-do-we-have-testing-data-for), and [detailed country-by-country source information](https://ourworldindata.org/coronavirus-testing#our-checklist-for-covid-19-testing-data). The testing dataset is updated around twice a week.
- **Other variables:** this data is collected from a variety of sources (United Nations, World Bank, Global Burden of Disease, Blavatnik School of Government, etc.). More information is available in [our codebook](https://github.com/owid/covid-19-data/tree/master/public/data/owid-covid-codebook.csv).
## The complete _Our World in Data_ COVID-19 dataset
**Our complete COVID-19 dataset is available in [CSV](https://covid.ourworldindata.org/data/owid-covid-data.csv), [XLSX](https://covid.ourworldindata.org/data/owid-covid-data.xlsx), and [JSON](https://covid.ourworldindata.org/data/owid-covid-data.json) formats, and includes all of our historical data on the pandemic up to the date of publication.**
The CSV and XLSX files follow a format of 1 row per location and date. The JSON version is split by country ISO code, with static variables and an array of daily records.
The variables represent all of our main data related to confirmed cases, deaths, and testing, as well as other variables of potential interest.
As of 10 September 2020, the columns are: `iso_code`, `continent`, `location`, `date`, `total_cases`, `new_cases`, `new_cases_smoothed`, `total_deaths`, `new_deaths`, `new_deaths_smoothed`, `total_cases_per_million`, `new_cases_per_million`, `new_cases_smoothed_per_million`, `total_deaths_per_million`, `new_deaths_per_million`, `new_deaths_smoothed_per_million`, `total_tests`, `new_tests`, `new_tests_smoothed`, `total_tests_per_thousand`, `new_tests_per_thousand`, `new_tests_smoothed_per_thousand`, `tests_per_case`, `positive_rate`, `tests_units`, `stringency_index`, `population`, `population_density`, `median_age`, `aged_65_older`, `aged_70_older`, `gdp_per_capita`, `extreme_poverty`, `cardiovasc_death_rate`, `diabetes_prevalence`, `female_smokers`, `male_smokers`, `handwashing_facilities`, `hospital_beds_per_thousand`, `life_expectancy`, `human_development_index`
A [full codebook](https://github.com/owid/covid-19-data/tree/master/public/data/owid-covid-codebook.csv) is made available, with a description and source for each variable in the dataset.
## Additional files and information
If you are interested in the individual files that make up the complete dataset, or more detailed information, other files can be found in the subfolders:
- [`ecdc`](https://github.com/owid/covid-19-data/tree/master/public/data/ecdc): data from the European Centre for Disease Prevention and Control, related to confirmed cases and deaths;
- [`testing`](https://github.com/owid/covid-19-data/tree/master/public/data/testing): data from various official sources, related to COVID-19 tests performed in each country. This folder contains two files with more detailed information:
- [`covid-testing-all-observations.csv`](https://github.com/owid/covid-19-data/blob/master/public/data/testing/covid-testing-all-observations.csv) includes, for each historical observation, the source of the individual data point, and sometimes notes on data collection;
- [`covid-testing-latest-data-source-details.csv`](https://github.com/owid/covid-19-data/blob/master/public/data/testing/covid-testing-latest-data-source-details.csv) includes, for each country in our testing dataset, the latest figures and a detailed description of how the country’s data is collected.
- [`who`](https://github.com/owid/covid-19-data/tree/master/public/data/who): data from the World Health Organization, related to confirmed cases and deaths—_we have stopped using and updating this data since 18 March 2020_.
## Changelog
- Up until 17 March 2020, we were using WHO data manually extracted from their daily [situation report PDFs](https://www.who.int/emergencies/diseases/novel-coronavirus-2019/situation-reports).
- From 19 March 2020, we started relying on data published by the [European CDC](https://www.ecdc.europa.eu/en/publications-data/download-todays-data-geographic-distribution-covid-19-cases-worldwide). We wrote about [why we decided to switch sources](https://ourworldindata.org/covid-sources-comparison).
- On 3 April 2020, we added country-level time series on COVID-19 tests.
- On 16 April 2020, we made available a [complete dataset of all of our main variables](https://github.com/owid/covid-19-data/tree/master/public/data) related to confirmed cases, deaths, and tests.
- On 25 April 2020, we added rows for "World" and "International" to our complete dataset. The `iso_code` column for "International" is blank, and for "World" we use `OWID_WRL`.
- On 9 May 2020, we added new variables related to demographic, economic, and public health data to our complete dataset.
- On 19 May 2020, we added 2 variables related to testing: `new_tests_smoothed` and `new_tests_smoothed_per_thousand`. To generate them we assume that testing changed equally on a daily basis over any periods in which no data was reported (as not all countries report testing data on a daily basis). This produces a complete series of daily figures, which is then averaged over a rolling 7-day window.
- On 23 May 2020, we added a JSON version of our complete dataset.
- On 4 June 2020, we added a `continent` column to our complete dataset.
- On 1 July 2020, we changed the format of the JSON version of our complete dataset to normalize the data and reduce file size.
- On 4 August 2020, we added the `positive_rate` and `tests_per_case` columns to our complete dataset.
- On 7 August 2020, we transformed our markdown codebook to a CSV file to allow easier merging with the complete dataset.
- On 17 August 2020, we added 4 variables related to cases and deaths: `new_cases_smoothed`, `new_deaths_smoothed`, `new_cases_smoothed_per_million`, and `new_deaths_smoothed_per_million`. These metrics are averaged versions (over a rolling 7-day window) of the daily variables.
- On 10 September 2020, we added the `human_development_index` column to our complete dataset.
## Data alterations
- **We standardize names of countries and regions.** Since the names of countries and regions are different in different data sources, we standardize all names to the [_Our World in Data_ standard entity names](https://github.com/owid/covid-19-data/blob/master/public/data/ecdc/locations.csv).
- We may correct or discard inconsistencies that we detect in the original data.
- Testing data is collected from many different sources. A detailed documentation for each country is available in [our post on COVID-19 testing](https://ourworldindata.org/coronavirus-testing#source-information-country-by-country).
- Where we collect multiple time series for a given country in our testing data (for example: for the United States, we collect data from both the CDC, and the COVID Tracking Project), our complete COVID-19 dataset only includes the most complete, or, if equally complete, data on the number of people tested rather than the number of tests/samples/swabs processed. The list of 'secondary' test series (those removed) is located in [`scripts/input/owid/secondary_testing_series.csv`](https://github.com/owid/covid-19-data/blob/master/scripts/input/owid/secondary_testing_series.csv).
## Stable URLs
The `/public` path of this repository is hosted at `https://covid.ourworldindata.org/`. For example, you can access the CSV for the complete dataset at `https://covid.ourworldindata.org/data/owid-covid-data.csv`.
We have the goal to keep all stable URLs working, even when we have to restructure this repository. If you need regular updates, please consider using the `covid.ourworldindata.org` URLs rather than pointing to GitHub.
## License
All visualizations, data, and code produced by _Our World in Data_ are completely open access under the [Creative Commons BY license](https://creativecommons.org/licenses/by/4.0/). You have the permission to use, distribute, and reproduce these in any medium, provided the source and authors are credited.
The data produced by third parties and made available by _Our World in Data_ is subject to the license terms from the original third-party authors. We will always indicate the original source of the data in our database, and you should always check the license of any such third-party data before use.
## Authors
This data has been collected, aggregated, and documented by Diana Beltekian, Daniel Gavrilov, Charlie Giattino, Joe Hasell, Bobbie Macdonald, Edouard Mathieu, Esteban Ortiz-Ospina, Hannah Ritchie, Max Roser.
The mission of _Our World in Data_ is to make data and research on the world’s largest problems understandable and accessible. [Read more about our mission](https://ourworldindata.org/about).
《COVID-19(冠状病毒)数据集》由我们的世界数据(Our World in Data)维护。该数据集每日更新,涵盖了确诊病例、死亡病例和检测数据,以及其他可能引起关注的相关变量。
### 🗂️ 下载完整的COVID-19数据集:[CSV](https://covid.ourworldindata.org/data/owid-covid-data.csv) | [XLSX](https://covid.ourworldindata.org/data/owid-covid-data.xlsx) | [JSON](https://covid.ourworldindata.org/data/owid-covid-data.)
我们将持续发布关于确诊病例、死亡病例和检测的最新数据,直至COVID-19大流行的结束。
## 数据来源
- **确诊病例和死亡病例:** 我们的数据来源于欧洲疾病预防控制中心(ECDC),我们在此[讨论了ECDC如何以及何时收集和发布这些数据](https://ourworldindata.org/coronavirus-source-data)。确诊病例和死亡病例数据集每日更新。*注:任何机构(包括ECDC、世界卫生组织、约翰斯·霍普金斯大学等)在任何一天报告的病例数或死亡数并不一定代表该日的实际数字。这是因为病例/死亡病例从报告到统计录入之间存在漫长的报告链条。这也意味着,当国家向ECDC发送更正时,病例和死亡数中有时会出现负值,因为之前对该病例/死亡数的高估。此外,如果ECDC决定(并且有访问必要数据的能力)对历史数据进行回顾性修正,有时(尽管很少)会对一个国家的整个时间序列进行大幅调整。***
- **COVID-19检测:** 此数据由我们的世界数据团队从官方报告中收集;您可以在我们关于COVID-19检测的帖子中找到更多详细信息,包括我们的[检测数据理解问题清单](https://ourworldindata.org/coronavirus-testing#our-checklist-for-covid-19-testing-data),关于[地理和时间覆盖范围](https://ourworldindata.org/coronavirus-testing#which-countries-do-we-have-testing-data-for)的信息,以及[按国家详细的来源信息](https://ourworldindata.org/coronavirus-testing#our-checklist-for-covid-19-testing-data)。检测数据集大约每周更新两次。
- **其他变量:** 此数据来源于多个来源(联合国、世界银行、全球疾病负担、布劳特尼克政府学院等)。更多信息可在我们的[代码簿](https://github.com/owid/covid-19-data/tree/master/public/data/owid-covid-codebook.csv)中找到。
## 完整的我们的世界数据COVID-19数据集
**我们的完整COVID-19数据集以[CSV](https://covid.ourworldindata.org/data/owid-covid-data.csv)、[XLSX](https://covid.ourworldindata.org/data/owid-covid-data.xlsx)和[JSON](https://covid.ourworldindata.org/data/owid-covid-data.)格式提供,并包含截至出版日期的所有历史大流行数据。**
CSV和XLSX文件采用每行代表一个地点和日期的格式。JSON版本按国家ISO代码分割,包含静态变量和每日记录数组。
变量代表了我们与确诊病例、死亡病例和检测相关的所有主要数据,以及其他可能引起关注的相关变量。
截至2020年9月10日,列包括:`iso_code`、`continent`、`location`、`date`、`total_cases`、`new_cases`、`new_cases_smoothed`、`total_deaths`、`new_deaths`、`new_deaths_smoothed`、`total_cases_per_million`、`new_cases_per_million`、`new_cases_smoothed_per_million`、`total_deaths_per_million`、`new_deaths_per_million`、`new_deaths_smoothed_per_million`、`total_tests`、`new_tests`、`new_tests_smoothed`、`total_tests_per_thousand`、`new_tests_per_thousand`、`new_tests_smoothed_per_thousand`、`tests_per_case`、`positive_rate`、`tests_units`、`stringency_index`、`population`、`population_density`、`median_age`、`aged_65_older`、`aged_70_older`、`gdp_per_capita`、`extreme_poverty`、`cardiovasc_death_rate`、`diabetes_prevalence`、`female_smokers`、`male_smokers`、`handwashing_facilities`、`hospital_beds_per_thousand`、`life_expectancy`、`human_development_index`
我们提供了一个[完整的代码簿](https://github.com/owid/covid-19-data/tree/master/public/data/owid-covid-codebook.csv),其中包含数据集中每个变量的描述和来源。
## 其他文件和信息
如果您对构成完整数据集的个别文件或更详细的信息感兴趣,其他文件可在以下子文件夹中找到:
- [ecdc](https://github.com/owid/covid-19-data/tree/master/public/data/ecdc):欧洲疾病预防控制中心关于确诊病例和死亡病例的数据;
- [testing](https://github.com/owid/covid-19-data/tree/master/public/data/testing):来自各个官方来源的COVID-19检测数据。此文件夹包含两个包含更详细信息的文件:
- [covid-testing-all-observations.csv](https://github.com/owid/covid-19-data/blob/master/public/data/testing/covid-testing-all-observations.csv):包括每个历史观察值的数据点来源,有时还包括数据收集的注释;
- [covid-testing-latest-data-source-details.csv](https://github.com/owid/covid-19-data/blob/master/public/data/testing/covid-testing-latest-data-source-details.csv):包括我们检测数据集中的每个国家的最新数据,以及如何收集该国家数据的详细描述。
- [who](https://github.com/owid/covid-19-data/tree/master/public/data/who):世界卫生组织关于确诊病例和死亡病例的数据——自2020年3月18日起,我们已停止使用和更新此数据。
## 更新日志
- 截至2020年3月17日,我们使用从其每日[情况报告PDF](https://www.who.int/emergencies/diseases/novel-coronavirus-2019/situation-reports)中手动提取的世界卫生组织数据。
- 自2020年3月19日起,我们开始依赖欧洲疾病预防控制中心(ECDC)发布的数据。我们撰写了关于[我们为何决定切换数据源](https://ourworldindata.org/covid-sources-comparison)的文章。
- 2020年4月3日,我们添加了关于COVID-19检测的国家级时间序列数据。
- 2020年4月16日,我们提供了所有主要变量(与确诊病例、死亡病例和检测相关)的完整数据集。
- 2020年4月25日,我们在完整数据集中添加了“世界”和“国际”的行。对于“国际”,`iso_code`列留空,对于“世界”,我们使用`OWID_WRL`。
- 2020年5月9日,我们在完整数据集中添加了与人口、经济和公共卫生数据相关的新的变量。
- 2020年5月19日,我们添加了两个与检测相关的变量:`new_tests_smoothed`和`new_tests_smoothed_per_thousand`。为了生成它们,我们假设在无数据报告的任何时间段内,检测每天都在同等程度上发生变化(因为并非所有国家每天都报告检测数据)。这产生了一系列完整的每日数字,然后在一个滚动7日窗口内进行平均。
- 2020年5月23日,我们添加了完整数据集的JSON版本。
- 2020年6月4日,我们在完整数据集中添加了一个`continent`列。
- 2020年7月1日,我们更改了完整数据集JSON版本的格式,以规范数据并减少文件大小。
- 2020年8月4日,我们在完整数据集中添加了`positive_rate`和`tests_per_case`列。
- 2020年8月7日,我们将我们的markdown代码簿转换为CSV文件,以便更容易与完整数据集合并。
- 2020年8月17日,我们添加了与病例和死亡相关的4个变量:`new_cases_smoothed`、`new_deaths_smoothed`、`new_cases_smoothed_per_million`和`new_deaths_smoothed_per_million`。这些指标是每日变量(在一个滚动7日窗口内)的平均值。
- 2020年9月10日,我们在完整数据集中添加了`human_development_index`列。
## 数据修改
- **我们标准化了国家和地区的名称。** 由于不同数据源中国家和地区的名称不同,我们已将所有名称标准化为[我们的世界数据标准实体名称](https://github.com/owid/covid-19-data/blob/master/public/data/ecdc/locations.csv)。
- 我们可能纠正或删除原始数据中检测到的不一致性。
- 检测数据来自许多不同的来源。每个国家的详细文档可在我们的[关于COVID-19检测的帖子](https://ourworldindata.org/coronavirus-testing#source-information-country-by-country)中找到。
- 在我们的检测数据中,我们对给定国家收集多个时间序列时(例如:对于美国,我们收集了CDC和COVID Tracking Project的数据),我们的完整COVID-19数据集仅包括最完整的数据,或者在数据同样完整的情况下,包括关于检测人数的数据,而不是检测/样本/拭子处理数量。已删除的“次要”测试序列列表位于[scripts/input/owid/secondary_testing_series.csv](https://github.com/owid/covid-19-data/blob/master/scripts/input/owid/secondary_testing_series.csv)。
## 稳定的URL
此存储库的`/public`路径托管在`https://covid.ourworldindata.org/`。例如,您可以通过[https://covid.ourworldindata.org/data/owid-covid-data.csv](https://covid.ourworldindata.org/data/owid-covid-data.csv)访问完整数据集的CSV。
我们的目标是保持所有稳定的URL正常工作,即使我们不得不重新结构化此存储库。如果您需要定期更新,请考虑使用`covid.ourworldindata.org` URL,而不是指向GitHub。
## 许可证
我们的世界数据(Our World in Data)产生的所有可视化、数据和代码均完全免费开放访问,受[创意共享BY许可](https://creativecommons.org/licenses/by/4.0/)许可。您有权以任何媒体使用、分发和复制这些内容,前提是必须注明来源和作者。
由第三方产生并由我们的世界数据(Our World in Data)提供的数据受原始第三方作者许可条款的约束。我们始终会在数据库中注明数据的原始来源,并且在使用此类第三方数据之前,您应始终检查此类第三方数据的许可。
## 作者
此数据由Diana Beltekian、Daniel Gavrilov、Charlie Giattino、Joe Hasell、Bobbie Macdonald、Edouard Mathieu、Esteban Ortiz-Ospina、Hannah Ritchie、Max Roser收集、汇总和编制。
我们的世界数据(Our World in Data)的使命是通过数据和研究使世界最大问题易于理解和获取。[了解更多关于我们的使命](https://ourworldindata.org/about)。
提供机构:
Kaggle
AI搜集汇总
数据集介绍

以上内容由AI搜集并总结生成



