Generated Prediction Data of COVID-19's Daily Infections in Brazil
收藏NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://data.mendeley.com/datasets/t2zk3xnt8y
下载链接
链接失效反馈官方服务:
资源简介:
Dataset general description:
• This dataset reports 4200 recurrent neural network models, their settings, and their relevant generated files (including prediction csv files, graphs, and metadata files, as applicable), for predicting COVID-19's daily infections in Brazil by training on limited raw data (30 and 40 time-steps). The used code is developed by the author and located in the following online data repository link:
http://dx.doi.org/10.17632/yp4d95pk7n.3
Dataset content:
• Models, Graphs, and csv predictions files:
1. Deterministic mode (DM): includes 1197 generated models' files (30 time-steps), and their generated 2835 graphs and 2835 predictions files. Similarly, this mode includes 1976 generated models' files (40 time-steps), and their generated 7301 graphs and 7301 predictions files.
2. Non-deterministic mode (NDM): includes 20 generated models' files (30 time-steps), and their generated 53 graphs and 53 predictions files.
3. Technical validation mode (TVM): includes 1001 generated models' files (30 time-steps), and their generated 3619 graphs and 3619 predictions files for 349 models (out of a 358 sample but 9 models didn't achieve the accuracy threshold), which are a sample of 1001 models. Also, all data of the control group - India (1 model).
4. 1 graph and 1 prediction files for each of DM and NDM, reporting evaluation till 2020-07-11.
5. The evaluation of performance for 10, 20, 30, 40, and 50 time-steps alternatives (5 models).
• Settings and metadata for the above 3 categories:
1. Used settings during the training session in json files.
2. Metadata: training / prediction setup and accuracy in csv files.
Raw data source used to train the models:
• The used raw data [1] for training the models is from: COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University) : https://github.com/CSSEGISandData/COVID-19 (accessed 2020-07-20)
• The following raw data links were used (both accessed 2020-07-08):
1. till 2020-06-29:
https://github.com/CSSEGISandData/COVID-19/raw/78d91b2dbc2a26eb2b2101fa499c6798aa22fca8/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv
2. till 2020-06-13:
https://github.com/CSSEGISandData/COVID-19/raw/02ea750a263f6d8b8945fdd3253b35d3fd9b1bee/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv
References:
1- Dong E, Du H, Gardner L. An interactive web-based dashboard to track COVID-19 in real time. Lancet Inf Dis. 20(5):533-534. doi: 10.1016/S1473-3099(20)30120-1
数据集总体说明:
• 本数据集收录了4200个循环神经网络(Recurrent Neural Network, RNN)模型及其相关配置与生成文件(视情况包含预测CSV文件、图表与元数据文件),用于通过有限原始数据(30与40个时间步)训练,预测巴西每日新冠确诊感染人数。本数据集所使用的代码由作者开发,可通过以下在线数据仓储链接获取:http://dx.doi.org/10.17632/yp4d95pk7n.3
数据集内容:
• 模型、图表与CSV预测文件:
1. 确定性模式(Deterministic Mode, DM):包含1197个(30个时间步)生成模型文件,以及与之对应的2835张图表与2835个预测文件;同理,该模式还包含1976个(40个时间步)生成模型文件,以及与之对应的7301张图表与7301个预测文件。
2. 非确定性模式(Non-deterministic Mode, NDM):包含20个(30个时间步)生成模型文件,以及与之对应的53张图表与53个预测文件。
3. 技术验证模式(Technical Validation Mode, TVM):包含1001个(30个时间步)生成模型文件,以及对应349个模型的3619张图表与3619个预测文件(样本总量为358个模型,其中9个未达到准确率阈值),该样本为1001个模型中的子集;此外还包含对照组印度的全部数据(共1个模型)。
4. 确定性模式与非确定性模式各有1张图表与1个预测文件,报告了截至2020-07-11的模型评估结果。
5. 针对10、20、30、40与50个时间步的5个模型,进行了性能评估。
• 上述三类模型的配置与元数据:
1. 训练阶段使用的配置,存储为JSON文件。
2. 元数据:训练/预测配置与准确率信息,存储为CSV文件。
模型训练所用原始数据源:
• 本数据集用于训练模型的原始数据[1]来源于约翰·霍普金斯大学系统科学与工程中心(Center for Systems Science and Engineering, CSSE)的新冠数据仓储:https://github.com/CSSEGISandData/COVID-19(访问时间:2020-07-20)
• 以下原始数据链接亦被使用(均于2020-07-08访问):
1. 截至2020-06-29:https://github.com/CSSEGISandData/COVID-19/raw/78d91b2dbc2a26eb2b2101fa499c6798aa22fca8/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv
2. 截至2020-06-13:https://github.com/CSSEGISandData/COVID-19/raw/02ea750a263f6d8b8945fdd3253b35d3fd9b1bee/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv
参考文献:
1- Dong E, Du H, Gardner L. 实时追踪新冠疫情的交互式网页仪表板. 柳叶刀·传染病. 20(5):533-534. doi: 10.1016/S1473-3099(20)30120-1
创建时间:
2020-08-04



