Spatial Clusters of Childhood Cancer: Benchmarking Data. As published in Schündeln et al. 2021 Cancer Epidemiology & Data in Brief

Mendeley Data2026-04-18 收录

下载链接：

https://data.mendeley.com/datasets/3hrg9tpsx9

下载链接

链接失效反馈

官方服务：

资源简介：

Incidence of newly diagnosed childhood cancer (140/1,000,000 children under 15 years) and nephroblastoma (7/1,000,000) was simulated. Clusters of defined size (1-50) were randomly assembled on the district level in Germany. Each cluster was simulated with ten different relative risk levels (1 to 100). For each combination 2000 iterations were done. Simulated data was then analysed by three local clustering tests: Besag-Newell (BN) method, spatial scan statistic (SSS) and Bayesian Besag-York-Mollié with Integrated Nested Laplace Approximation approach (BYM). See references for published manuscripts. RAW DATA: The simulated raw data is reported in the Rdata files: "AllMalignancies.Rdata" and " NephroblastomaSimulation.Rdata". These files contain 6 lists for the different cluster sizes ("Cluster Size X"). Within each of these lists 2000 simulations for clusters in 10 different risk levels ("RR Y Cluster") and the corresponding simulated cases for each of the respective scenario ("RR Y SimCases") are found. In addition, each file contains the population of children under 15 years for each district (“District Population”) and the expected cases for the entities, all cancer or nephroblastoma, (“Expected Cases”) per district. Adjacency matrix for the 402 German districts is added as separate Rdata file. The code and the GADM shape files to reproduce the original simulation and published study at: https://github.com/Pediatrics/Childhood-Cancer-Study ANALYZED DATA: Operating characteristics of each of the various cluster detection methods and scenarios in this study is reported according to the quality criteria detailed below ("Analyzed Data.xlsx") Minimum Power (MP): Proportion of simulations detecting at least one district of the true cluster Exact Power (EP): Proportion of simulations detecting the true cluster without false positives Sensitivity (sens): Proportion of correctly detected districts in the true cluster Specificity (spec): Percentage of normal risk districts, correctly classified as normal risk districts Positive predictive value (PPV): Proportion of districts in the detected cluster belonging to the true cluster Negative predictive value (NPV): Proportion of districts not labeled as a risk cluster that is not part of the true cluster Correct classification (CC): Percentage of correctly classified districts of all districts Correct proportion (CP): Correctly labeled districts of all detected potential HR districts Positive diagnostic likelihood (PDL): The ratio of high-risk districts being detected, divided by the probability non-HR districts being detected Negative diagnostic likelihood (NDL): The ratio of high-risk districts not being detected divided by the probability of non-high-risk districts not being detected False positive rate (FPR): Incorrectly labeled high-risk districts of all detected high-risk districts False negative rate (FNR): Incorrectly labeled normal-risk districts of all detected normal-risk districts

本研究模拟了15岁以下儿童新发癌症（每百万人140例）与肾母细胞瘤（每百万人7例）的发病情况。将规模为1至50的聚类在德国区级行政单元层面随机构建，每个聚类分别以10种不同的相对风险（Relative Risk, RR）水平（1至100）开展模拟，每种参数组合均执行2000次迭代。随后采用三种局部聚类检验方法对模拟数据进行分析：Besag-Newell（BN）法、空间扫描统计量（Spatial Scan Statistic, SSS）以及采用集成嵌套拉普拉斯近似（Integrated Nested Laplace Approximation, INLA）的贝叶斯Besag-York-Mollié（BYM）模型。详见已发表论文的参考文献。 ### 原始数据模拟原始数据存储于Rdata格式文件："AllMalignancies.Rdata" 与 "NephroblastomaSimulation.Rdata"。上述文件包含6个对应不同聚类规模的列表（命名格式为"Cluster Size X"）。每个列表中均包含10种相对风险水平下对应聚类的2000次模拟结果（命名格式为"RR Y Cluster"），以及各场景下对应的模拟发病数（命名格式为"RR Y SimCases"）。此外，每个文件均包含各区级行政单元的15岁以下儿童人口数（命名为"District Population"），以及各行政区对应的实体（全部癌症或肾母细胞瘤）预期发病数（命名为"Expected Cases"）。德国402个区级行政单元的邻接矩阵以独立Rdata文件形式提供。可用于复现本研究原始模拟与已发表成果的代码及GADM格式矢量文件可从以下链接获取：https://github.com/Pediatrics/Childhood-Cancer-Study ### 分析数据本研究中各聚类检测方法与不同场景的运行特征已按照下述质量标准整理于"Analyzed Data.xlsx"文件中： - 最小功效（Minimum Power, MP）：至少检出真实聚类中任一区域的模拟占比 - 精确功效（Exact Power, EP）：无假阳性情况下检出真实聚类的模拟占比 - 灵敏度（Sensitivity, sens）：真实聚类中被正确检出区域的占比 - 特异度（Specificity, spec）：被正确归类为正常风险区域的正常风险区域占比 - 阳性预测值（Positive Predictive Value, PPV）：检出聚类中属于真实聚类的区域占比 - 阴性预测值（Negative Predictive Value, NPV）：未被标记为风险聚类且不属于真实聚类的区域占比 - 正确分类率（Correct Classification, CC）：所有区域中被正确分类的区域占比 - 正确检出比例（Correct Proportion, CP）：所有检出的潜在高风险区域中被正确标记的区域占比 - 阳性诊断似然比（Positive Diagnostic Likelihood, PDL）：高风险区域被检出的比例与非高风险区域被检出的比例之比 - 阴性诊断似然比（Negative Diagnostic Likelihood, NDL）：高风险区域未被检出的比例与非高风险区域未被检出的比例之比 - 假阳性率（False Positive Rate, FPR）：所有被检出的高风险区域中被错误标记为高风险的区域占比 - 假阴性率（False Negative Rate, FNR）：所有被检出的正常风险区域中被错误标记为正常风险的区域占比

创建时间：

2021-01-04