×

Calibrating machine learning approaches for probability estimation: a comprehensive comparison. (English) Zbl 1540.62177

MSC:

62P10 Applications of statistics to biology and medical sciences; meta analysis

Software:

UCI-ml; ranger; Kernlab; isotone; R

References:

[1] DiamondGA, ForresterJS. Analysis of probability as an aid in the clinical diagnosis of coronary‐artery disease. N Engl J Med. 1979;300:1350‐1358. doi:10.1056/NEJM197906143002402
[2] XieG, WangR, ShangL, et al. Calculating the overall survival probability in patients with cervical cancer: a nomogram and decision curve analysis‐based study. BMC Cancer. 2020;20:833. doi:10.1186/s12885‐020‐07349‐4
[3] BoyerB, CazorlaC. Methods and probability of success after early revision of prosthetic joint infections with debridement, antibiotics and implant retention. Orthop Traumatol Surg Res. 2021;107:102774. doi:10.1016/j.otsr.2020.102774
[4] UttleyAM. Temporal and spatial patterns in a conditional probability machine. In: ShannonCE (ed.), McCarthyJ (ed.), eds. Automata Studies. Princeton: Princeton University Press; 1956:277‐285.
[5] AltmanDG, RoystonP. What do we mean by validating a prognostic model?Stat Med. 2000;19:453‐473. doi:10.1002/(SICI)1097‐0258(20000229)19:4<453::AID‐SIM350>3.0.CO;2‐5
[6] JusticeAC, CovinskyKE, BerlinJA. Assessing the generalizability of prognostic information. Ann Intern Med. 1999;130:515‐524. doi:10.7326/0003‐4819‐130‐6‐199903160‐00016
[7] Van CalsterB, NieboerD, VergouweY, De CockB, PencinaMJ, SteyerbergEW. A calibration hierarchy for risk models was defined: from utopia to empirical data. J Clin Epidemiol. 2016;74:167‐176. doi:10.1016/j.jclinepi.2015.12.005
[8] KullM, Silva FilhoTM, FlachP. Beyond sigmoids: how to obtain well‐calibrated probabilities from binary classifiers with beta calibration. Electron J Statist. 2017;11:5052‐5080. doi:10.1214/17‐EJS1338SI · Zbl 1384.62197
[9] BökenB. On the appropriateness of Platt scaling in classifier calibration. Inf Syst. 2021;95:101641. doi:10.1016/j.is.2020.101641
[10] PlattJ. Probabilistic outputs for support vector machines and comparison to regularized likelihood methods. In: SmolaAJ (ed.), BartlettPJ (ed.), SchölkopfB (ed.), SchuurmansD (ed.), eds. Advances in Large Margin Classifiers. Cambridge: MIT Press; 2000:61‐74.
[11] FawcettT, Niculescu‐MizilA. PAV and the ROC convex hull. Mach Learn. 2007;68:97‐106. doi:10.1007/s10994‐007‐5011‐0 · Zbl 1470.62082
[12] ZadroznyB, ElkanC. Transforming classifier scores into accurate multiclass probability estimates. In: HandD (ed.), KeimDA (ed.), NgR (ed.), eds. Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: Association for Computing Machinery; 2002:694‐699. doi:10.1145/775047.775151
[13] ElkanC. The foundations of cost‐sensitive learning. In: NebelB (ed.), ed. Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence. Vol 2. San Francisco: Morgan Kaufmann; 2001:973‐978.
[14] DankowskiT, ZieglerA. Calibrating random forests for probability estimation. Stat Med. 2016;35:3949‐3960. doi:10.1002/sim.6959
[15] DuaD, GraffC. UCI Machine Learning Repository. Irvine, CA: School of Information and Computer Sciences, University of California; 2019. https://archive‐beta.ics.uci.edu. Accessed June 1, 2023
[16] R Core Team. R: a language and environment for statistical computing. 2022.
[17] MillerME, HuiSL, TierneyWM. Validation techniques for logistic regression models. Stat Med. 1991;10:1213‐1226. doi:10.1002/sim.4780100805
[18] CoxDR. Two further applications of a model for binary regression. Biometrika. 1958;45:562‐565. doi:10.2307/2333203 · Zbl 0085.13715
[19] SteyerbergEW. Clinical Prediction Models: A Practical Approach to Development, Validation, and Updating. 2nd ed.Cham: Springer; 2019.
[20] LucenaB. Spline‐based probability calibration. arXiv 2018: 1809.07751. https://arxiv.org/abs/1809.07751. Accessed June 1, 2023.
[21] RegressionHJ, StrategiesM. With Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis. 2nd ed.Cham: Springer; 2015. · Zbl 1330.62001
[22] ZhangJ, YangY. Probabilistic score estimation with piecewise logistic regression. In: GreinerR (ed.), SchuurmansD (ed.), eds. Proceedings of the
[(21 {\kern0em }^{st} \]\) International Conference on Machine Learning. New York: ACM Press; 2004:115‐123.
[23] DormannCF. Calibration of probability predictions from machine‐learning and statistical models. Glob Ecol Biogeogr. 2020;29:760‐765. doi:10.1111/geb.13070
[24] LandwehrN, HallM, FrankE. Logistic model trees. Mach Learn. 2005;59:161‐205. doi:10.1007/s10994‐005‐0466‐3 · Zbl 1469.68092
[25] LeathartT, FrankE, HolmesG, PfahringerB. Probability calibration trees. In: Min‐LingZ (ed.), Yung‐KyunN (ed.), eds. Proceedings of the 9th Asian Conference on Machine Learning. Cambridge, MA: ML Research Press; 2017:145‐160. http://proceedings.mlr.press/v77/leathart17a.html. Accessed June 1, 2023.
[26] deLeeuwJ, HornikK, MairP. Isotone optimization in R: pool‐adjacent‐violators algorithm (PAVA) and active set methods. J Stat Softw. 2009;32:1‐24. doi:10.18637/jss.v032.i05
[27] BellaA, FerriC, Hernández‐OralloJ, Ramírez‐QuintanaMJ. Calibration of machine learning models. In: Soria OlivasE (ed.), Martín GuerreroJD (ed.), Martinez SoberM (ed.), Magdalena BeneditoJR (ed.), Serrano LópezAJ (ed.), eds. Handbook of Research on Machine Learning Applications and Trends: Algorithms, Methods, and Techniques. Hershey: IGI Global; 2010:128‐146. doi:10.4018/978‐1‐60960‐818‐7.ch104
[28] HuangY, LiW, MacheretF, GabrielRA, Ohno‐MachadoL. A tutorial on calibration measurements and calibration models for clinical prediction models. J Am Med Inform Assoc. 2020;27:621‐633. doi:10.1093/jamia/ocz228
[29] DimitriadisT, GneitingT, JordanAI. Stable reliability diagrams for probabilistic classifiers. Proc Natl Acad Sci. 2021;118:e2016191118. doi:10.1073/pnas.2016191118
[30] NaeiniMP, CooperGF, HauskrechtM. Obtaining well calibrated probabilities using Bayesian binning. Proc Conf AAAI Artif Intell. 2015;2015:2901‐2907.
[31] ZadroznyB, ElkanC. Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers. In: BrodleyCE (ed.), DanylukAP (ed.), eds. Proceedings of the 18th International Conference on Machine Learning (ICML 2001). Burlington: Morgan Kaufmann; 2001:609‐2616.
[32] ChenW, SahinerB, SamuelsonF, PezeshkA, PetrickN. Calibration of medical diagnostic classifier scores to the probability of disease. Stat Methods Med Res. 2018;27:1394‐1409. doi:10.1177/0962280216661371
[33] BellaA, FerriC, Hernández‐OralloJ, Ramírez‐QuintanaMJ. Similarity‐binning averaging: a generalisation of binning calibration. In: CorchadoE (ed.), YinH (ed.), eds. Intelligent Data Engineering and Automated Learning - IDEAL 2009. Berlin: Springer; 2009:341‐349. doi:10.1007/978‐3‐642‐04394‐9_42
[34] BiauG, CérouF, GuyaderA. Rates of convergence of the functional k‐nearest neighbor estimate. IEEE Transact Inform Theor. 2010;56:2034‐2040. doi:10.1109/TIT.2010.2040857 · Zbl 1366.62080
[35] BellaA, FerriC, Hernández‐OralloJ, Ramírez‐QuintanaMJ. On the effect of calibration in classifier combination. Appl Intell. 2013;38:566‐585. doi:10.1007/s10489‐012‐0388‐2
[36] JiangX, OslM, KimJ, Ohno‐MachadoL. Calibrating predictive model estimates to support personalized medicine. J Am Med Inform Assoc. 2012;19:263‐274. doi:10.1136/amiajnl‐2011‐000291
[37] BrierGW. Verification of forecasts expressed in terms of probability. Mon Wea Rev. 1950;78:1‐3. doi:10.1175/1520‐0493(1950)078<0001:VOFEIT>2.0.CO;2
[38] KruppaJ, LiuY, DienerHC, et al. Probability estimation with machine learning methods for dichotomous and multi‐category outcome: applications. Biom J. 2014;56:564‐583. doi:10.1002/bimj.201300077 · Zbl 1441.62405
[39] VovkV. The fundamental nature of the log loss function. In: BeklemishevLD (ed.), BlassA (ed.), DershowitzN (ed.), FinkbeinerB (ed.), SchulteW (ed.), eds. Fields of Logic and Computation II: Essays Dedicated to Yuri Gurevich on the Occasion of his 75th Birthday. Cham: Springer; 2015:307‐318. doi:10.1007/978‐3‐319‐23534‐9_20 · Zbl 1465.68116
[40] GneitingT, RafteryAE. Strictly proper scoring rules, prediction, and estimation. J Am Stat Assoc. 2007;102:359‐378. doi:10.1198/016214506000001437 · Zbl 1284.62093
[41] MalleyJD, KruppaJ, DasguptaA, MalleyKG, ZieglerA. Probability machines: consistent probability estimation using nonparametric learning machines. Methods Inf Med. 2012;51:74‐81. doi:10.3414/ME00‐01‐0052
[42] DemšarJ. Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res. 2006;7:1‐30. · Zbl 1222.68184
[43] MeaseD, WynerAJ, BujaA. Boosted classification trees and class probability/quantile estimation. J Mach Learn Res. 2007;8:409‐439. · Zbl 1222.68261
[44] WeimarC, ZieglerA, KönigIR, DienerHC. On behalf of the German stroke study collaborators. Predicting functional outcome and survival after acute ischemic stroke. J Neurol. 2002;249:888‐895. doi:10.1007/s00415‐002‐0755‐8
[45] WeimarC, KönigIR, KraywinkelK, ZieglerA, DienerHC, German Stroke Study Collaboration. Age and National Institutes of Health stroke scale score within 6 hours after onset are accurate predictors of outcome after cerebral ischemia: development and external validation of prognostic models. Stroke. 2004;35:158‐162. doi:10.1161/01.STR.0000106761.94985.8B
[46] CollinsGS, ReitsmaJB, AltmanDG, MoonsKG. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. Bmj. 2015;350:g7594. doi:10.1136/bmj.g7594
[47] KönigIR, WeimarC, DienerHC, ZieglerA. Vorhersage des Funktionsstatus 100 Tage nach einem ischämischen Schlaganfall: design einer prospektiven Studie zur externen Validierung eines prognostischen Modells. Z Arztl Fortbild Qualitatssich. 2003;97:717‐722.
[48] MahoneyFI, BarthelDW. Functional evaluation: the Barthel index. Md Med J. 1965;14:56‐61.
[49] KönigIR, MalleyJD, WeimarC, DienerHC, ZieglerA. On behalf of the German stroke study collaboration. Practical experiences on the necessity of external validation. Stat Med. 2007;26:5499‐5511. doi:10.1002/sim.3069
[50] WatsonDS, WrightMN. Testing conditional independence in supervised learning algorithms. Mach Learn. 2021;110:2107‐2129. doi:10.1007/s10994‐021‐06030‐6 · Zbl 07465666
[51] SeiffertM, OjedaF, MüllerleileK, et al. Reducing radiation exposure during invasive coronary angiography and percutaneous coronary interventions implementing a simple four‐step protocol. Clin Res Cardiol. 2015;104:500‐506. doi:10.1007/s00392‐015‐0814‐7
[52] DetranoR, JanosiA, SteinbrunnW, et al. International application of a new probability algorithm for the diagnosis of coronary artery disease. Am J Cardiol. 1989;64:304‐310. doi:10.1016/0002‐9149(89)90524‐9
[53] ZouH, HastieT. Regularization and variable selection via the elastic net. J R Stat Soc B. 2005;67:301‐320. doi:10.1111/j.1467‐9868.2005.00503.x · Zbl 1069.62054
[54] FriedmanJ, HastieT, TibshiraniR. Regularization paths for generalized linear models via coordinate descent. J Stat Softw. 2010;33:1‐22. doi:10.18637/jss.v033.i01
[55] BühlmannP, HothornT. Boosting algorithms: regularization, prediction and model fitting. Stat Sci. 2007;22:477‐505. doi:10.1214/07‐STS242 · Zbl 1246.62163
[56] HofnerB, MayrA, RobinzonovN, SchmidM. Model‐based boosting in R: a hands‐on tutorial using the R package mboost. Comput Stat. 2014;29:3‐35. doi:10.1007/s00180‐012‐0382‐5 · Zbl 1306.65069
[57] MayrA, BinderH, GefellerO, SchmidM. The evolution of boosting algorithms – from machine learning to statistical modelling. Methods Inf Med. 2014;53:419‐427. doi:10.3414/ME13‐01‐0122
[58] ZieglerA, KönigIR. Mining data with random forests: current options for real‐world applications. WIRE Data Mining Knowl Discov. 2014;4:55‐63. doi:10.1002/widm.1114
[59] WrightMN, ZieglerA. Ranger: a fast implementation of random forests for high dimensional data in C++ and R. J Stat Softw. 2017;77:1.
[60] KruppaJ, LiuY, BiauG, et al. Probability estimation with machine learning methods for dichotomous and multicategory outcome: theory. Biom J. 2014;56:534‐563. doi:10.1002/bimj.201300068 · Zbl 1441.62404
[61] ChristmannA, SteinwartI. Support Vector Machines. New York: Springer; 2008. · Zbl 1203.68171
[62] KaratzoglouA, SmolaA, HornikK, ZeileisA. Kernlab – an S4 package for kernel methods in R. J Stat Softw. 2004;11:1‐20.
[63] TorgoL. An infra‐structure for performance estimation and experimental comparison of predictive models in R. arXiv 2015: 1412.0436v4. 2015https://arxiv.org/abs/1412.0436v4. Accessed June 1, 2023.
[64] XuP, DavoineF, ZhaH, DenœuxT. Evidential calibration of binary SVM classifiers. Int J Approx Reason. 2016;72:55‐70. doi:10.1016/j.ijar.2015.05.002 · Zbl 1352.68208
[65] AustinPC, SteyerbergEW. Events per variable (EPV) and the relative performance of different strategies for estimating the out‐of‐sample validity of logistic regression models. Stat Methods Med Res. 2017;26:796‐808. doi:10.1177/0962280214558972
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.