Calibration of probability predictions from machine-learning and statistical models
收藏DataCite Commons2025-06-01 更新2025-06-15 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.xksn02vbq
下载链接
链接失效反馈官方服务:
资源简介:
This data set describes the occurrence (yes/no) of a bird, the Southern
Whiteface (Aphelocephala leucopsis) in Australia. A suite of environmental
variables is provided, which are used in the paper to illustrate a
statistical problem. The data are meant to allow reproduction of the
analysis in this paper. They are not intended for actual ecological
analysis. The data come as .Rdata-file, i.e. as an R-dataset (described
technically here:
https://www.loc.gov/preservation/digital/formats/fdd/fdd000470.shtml).
Here is the paper's abstract: Aim: Predictions from statistical
models may be uncalibrated, meaning that the predicted values do not have
the nominal coverage probability. This is easiest seen with probability
predictions in machine-learning classification, including the common
species occurrence probabilities. Here, a predicted probability of, say,
0.7 should indicate that out of 100 cases with these environmental
conditions, and hence the same predicted probability, the species should
be present in 70 and absent in 30. Innovation: A simple calibration plot
shows that this is not necessarily the case, particularly not for
over-fitted models or algorithms that use non-likelihood target functions.
As a consequence, “raw” predictions from such model could easily be off by
0.2, are unsuitable for averaging across model types, and resulting maps
hence be substantially distorted. The solution, a flexible calibration
regression, is simple and can be applied whenever deviations are observed.
Conclusion: “Raw”, uncalibrated probability predictions should be
calibrated before interpreting or averaging them in a probabilistic way.
提供机构:
Dryad
创建时间:
2020-01-31



