five

Covid-19 Global Dataset

收藏
www.kaggle.com2022-05-15 更新2025-01-15 收录
下载链接:
https://www.kaggle.com/josephassaker/covid19-global-dataset
下载链接
链接失效反馈
官方服务:
资源简介:
### For the latest analysis and visualizations of the COVID-19 pandemic, check out my constantly updated EDA notebook [here](https://www.kaggle.com/josephassaker/covid-19-global-data-analysis-visualization) 📈. --- ## Context > Severe acute respiratory syndrome coronavirus 2 (**SARS-CoV-2**) is the strain of coronavirus that causes **coronavirus disease 2019 (COVID-19)**, the respiratory illness responsible for the **COVID-19 pandemic**. Since its first identification in December 2019 in Wuhan, China, this virus has taken the world by storm. Some people prefer to look at the positive side of things and how this pandemic has brought forward several positive changes. However, the collateral damages produced by this pandemic cannot be overlooked. From the [Economic impact](https://en.wikipedia.org/wiki/Economic_impact_of_the_COVID-19_pandemic) to [Mental Health impacts](https://en.wikipedia.org/wiki/Mental_health_during_the_COVID-19_pandemic), this pandemic period will arguably be one of the hardest periods we'll encounter in our lives. That being said, we always have to arm ourselves with hope. With the new advancements in the vaccine studies, let's hope to wake up from this nightmare as soon as possible. > “Hope is being able to see that there is light despite all of the darkness.” – Desmond Tutu As for the reason for me building this dataset, it's because I couldn't get my hands on an easily digestible and up-to-date dataset of Covid-19, so, I decided to build my own using Python and web scraping techniques. I will also update this dataset as frequently as possible! ## Content This data was scraped from woldometers.info on 2022-05-14 by Joseph Assaker. 225 countries are represented in this data. All of countries have records dating from 2020-2-15 until 2022-05-14 (820 days per country). That's with the exception of *China*, which has records dating from 2020-1-22 until 2022-05-14 (844 days per country), and *Palau* which has records dating from 2021-8-25 until 2022-05-14 (263 days per country).. ####Summary Data Columns Description: * **country**: designates the Country in which the the row's data was observed. * **continent**: designates the Continent of the observed country. * **total\_confirmed**: designates the total number of confirmed cases in the observed country. * **total\_deaths**: designates the total number of confirmed deaths in the observed country. * **total\_recovered**: designates the total number of confirmed recoveries in the observed country. * **active\_cases**: designates the number of active cases in the observed country. * **serious\_or\_critical**: designates the estimated number of cases in serious or critical conditions in the observed country. * **total\_cases\_per\_1m\_population**: designates the number of total cases per 1 million population in the observed country. * **total\_deaths\_per\_1m\_population**: designates the number of total deaths per 1 million population in the observed country. * **total\_tests**: designates the number of total tests done in the observed country. * **total\_tests\_per\_1m\_population**: designates the number of total test done per 1 million population in the observed country. * **population**: designates the population count in the observed country. ####Daily Data Columns Description: * **date**: designates the date of observation of the row's data in YYYY-MM-DD format. * **country**: designates the Country in which the the row's data was observed. * **cumulative\_total\_cases**: designates the cumulative number of confirmed cases as of the row's date, for the row's country. * **daily\_new\_cases**: designates the daily new number of confirmed cases on the row's date, for the row's country. * **active\_cases**: designates the number of active cases (i.e., confirmed cases that still didn't recover nor die) on the row's date, for the row's country. * **cumulative\_total\_deaths**: designates the cumulative number of confirmed deaths as of the row's date, for the row's country. * **daily\_new\_deaths**: designates the daily new number of confirmed deaths on the row's date, for the row's country. ## Acknowledgements As previously mentioned, all the data present in this dataset is scraped from [worldometers.info](https://www.worldometers.info/coronavirus/). ## Inspiration Going through this data, Kagglers can visualize various trends in their own country, or compare several countries. One can also combine this dataset with other news and key points in time (lockdowns, new UK mutation, Holidays, etc.) in order to study the effects of these events on the progression of Covid-19 in a multitude of countries. Implementing time series analysis on this dataset would also be an amazing idea! Getting a deep learning algorithm to learn from this sea of data and try to predict the future turn of events could be quite interesting!

为获取关于COVID-19大流行的最新分析与可视化,请查阅我持续更新的数据分析与可视化笔记簿[此处](https://www.kaggle.com/josephassaker/covid-19-global-data-analysis-visualization) 📈。 ## 背景 严重急性呼吸综合征冠状病毒2(SARS-CoV-2)是引起冠状病毒病2019(COVID-19)的冠状病毒株,COVID-19是一种呼吸系统疾病,导致了COVID-19大流行。自2019年12月在中国的武汉首次被识别以来,这种病毒迅速席卷全球。有些人更愿意看到事物的积极面,以及这场大流行如何催生了诸多积极的变化。然而,这场大流行所带来的附带损害不容忽视。从[经济影响](https://en.wikipedia.org/wiki/Economic_impact_of_the_COVID-19_pandemic)到[心理健康影响](https://en.wikipedia.org/wiki/Mental_health_during_the_COVID-19_pandemic),这场大流行期可能将成为我们生活中所经历的最为艰难的时期之一。 尽管如此,我们始终需要怀抱希望。随着疫苗研究的新进展,让我们期待早日从这场噩梦之中醒来。 “希望是在黑暗中仍能看见光明的能力。” – 德斯蒙德·图图 至于构建此数据集的原因,是因为我无法轻易获取一个易于消化且更新及时的COVID-19数据集,因此,我决定使用Python和网页抓取技术自行构建。我将尽可能地频繁更新此数据集。 ## 内容 此数据于2022年5月14日由Joseph Assaker从worldometers.info抓取。 本数据涵盖了225个国家。 所有国家均有自2020年2月15日至2022年5月14日(每国820天)的记录。唯一例外的是中国,其记录始于2020年1月22日至2022年5月14日(每国844天),以及帕劳,其记录始于2021年8月25日至2022年5月14日(每国263天)。 #### 摘要数据列描述: * **country**:指定观测数据的所在国家。 * **continent**:指定观测国家的所在洲。 * **total_confirmed**:指定观测国家中确认的病例总数。 * **total_deaths**:指定观测国家中确认的死亡总数。 * **total_recovered**:指定观测国家中确认的康复总数。 * **active_cases**:指定观测国家中的活跃病例数。 * **serious_or_critical**:指定观测国家中严重或危重病例的估计数量。 * **total_cases_per_1m_population**:指定观测国家中每百万人口的总病例数。 * **total_deaths_per_1m_population**:指定观测国家中每百万人口的总死亡数。 * **total_tests**:指定观测国家中进行的总检测数。 * **total_tests_per_1m_population**:指定观测国家中每百万人口的总检测数。 * **population**:指定观测国家的人口数量。 #### 每日数据列描述: * **date**:指定观测数据的日期,格式为YYYY-MM-DD。 * **country**:指定观测数据的所在国家。 * **cumulative_total_cases**:指定观测日期前累计的确认病例数,针对观测国家。 * **daily_new_cases**:指定观测日期当天的每日新增确认病例数,针对观测国家。 * **active_cases**:指定观测日期当天活跃病例数(即尚未康复或死亡的确认病例数),针对观测国家。 * **cumulative_total_deaths**:指定观测日期前累计的确认死亡数,针对观测国家。 * **daily_new_deaths**:指定观测日期当天的每日新增确认死亡数,针对观测国家。 ## 致谢 如前所述,本数据集中所有数据均来自[worldometers.info](https://www.worldometers.info/coronavirus/)。 ## 灵感 通过分析此数据,Kagglers可以可视化自己国家的各种趋势,或比较多个国家。还可以将此数据集与其他新闻和关键时间点(封锁、新变异株、假日等)结合,以研究这些事件对多个国家COVID-19进展的影响。在此数据集上实施时间序列分析也是一个极好的想法!利用深度学习算法从这些数据中学习,并尝试预测未来的走向,将非常有趣!
提供机构:
Kaggle
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作