five

Building test data from real outbreaks for evaluating detection algorithms

收藏
NIAID Data Ecosystem2026-03-10 收录
下载链接:
https://figshare.com/articles/dataset/Building_test_data_from_real_outbreaks_for_evaluating_detection_algorithms/5369200
下载链接
链接失效反馈
官方服务:
资源简介:
Benchmarking surveillance systems requires realistic simulations of disease outbreaks. However, obtaining these data in sufficient quantity, with a realistic shape and covering a sufficient range of agents, size and duration, is known to be very difficult. The dataset of outbreak signals generated should reflect the likely distribution of authentic situations faced by the surveillance system, including very unlikely outbreak signals. We propose and evaluate a new approach based on the use of historical outbreak data to simulate tailored outbreak signals. The method relies on a homothetic transformation of the historical distribution followed by resampling processes (Binomial, Inverse Transform Sampling Method—ITSM, Metropolis-Hasting Random Walk, Metropolis-Hasting Independent, Gibbs Sampler, Hybrid Gibbs Sampler). We carried out an analysis to identify the most important input parameters for simulation quality and to evaluate performance for each of the resampling algorithms. Our analysis confirms the influence of the type of algorithm used and simulation parameters (i.e. days, number of cases, outbreak shape, overall scale factor) on the results. We show that, regardless of the outbreaks, algorithms and metrics chosen for the evaluation, simulation quality decreased with the increase in the number of days simulated and increased with the number of cases simulated. Simulating outbreaks with fewer cases than days of duration (i.e. overall scale factor less than 1) resulted in an important loss of information during the simulation. We found that Gibbs sampling with a shrinkage procedure provides a good balance between accuracy and data dependency. If dependency is of little importance, binomial and ITSM methods are accurate. Given the constraint of keeping the simulation within a range of plausible epidemiological curves faced by the surveillance system, our study confirms that our approach can be used to generate a large spectrum of outbreak signals.
创建时间:
2017-09-02
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作