Predicting the risks of kidney failure and death in adults with moderate to severe chronic kidney disease
BMJ 2024; 385 doi: https://doi.org/10.1136/bmj.q721 (Published 15 April 2024) Cite this as: BMJ 2024;385:q721Linked Research
Predicting the risks of kidney failure and death in adults with moderate to severe chronic kidney disease
Linked Editorial
Predicting the outcomes of chronic kidney disease in older adults
- 1Department of Public Health, University of Copenhagen, Copenhagen, Denmark
- 2Departments of Medicine and Community Health Sciences, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
- Correspondence to: P Ravani pravani{at}ucalgary.ca
What is a medical risk prediction model?
A medical risk prediction model reads the data of a patient and returns their predicted risks.12 Generally speaking, the model makes predictions by referring to what happened to similar patients in the past, as recorded in a learning dataset. For example, a model could predict for a new patient with CKD that within two years from now their risk of kidney failure is 8% and their risk of death is 13%. These predictions are interpretable as follows: out of 100 people who today are all like this patient, eight are expected to develop kidney failure and 13 are expected to die within the next two years. Notice that patients who first develop kidney failure and then die contribute to both outcomes. Whenever competing events prevent the outcome of interest, in our case death, medical decision making needs to account for the predicted risks of all events.
Risk prediction framework
Creating a medical risk prediction model based on electronic health records is challenging. A sound framework includes the definition of a clinically meaningful time zero, which is called prediction time origin, and one or multiple prediction time horizons.2 Subsequently, the availability of predictor information at the time origin (patient age, sex, albuminuria, etc) should be verified. Inclusion of predictor variables that are accessible in a timely fashion and without great additional costs enhances model usability. For example, in the linked study the super learner could use four variables (sex, age, albuminuria, and estimated glomerular filtration rate, a marker of kidney function) or the same four variables plus history of diabetes or cardiovascular disease.3 All these variables are routinely available during a clinical encounter.
Why use a super learner
The motivation for using a super learner is best explained by considering alternative strategies. A traditional statistical approach could be to prespecify a regression model and then perform a sequence of model goodness-of-fit tests based on the whole learning dataset. For example, the regression model could include additive and linear effects of all predictor variables (ie, all predictors without transformations or interactions). What should the modeller do if the model fit is rejected? A more complex model could be specified post-hoc (eg, including interactions and non-linear relationships), followed by a new sequence of model goodness-of-fit tests. If the sample size is large, as in the linked study,3 small deviations from model assumptions may be statistically significant even if they are not clinically relevant. Such a procedure would continue to reject the fit and eventually lead to a very flexible model. The challenge is that if the model goodness-of-fit tests are performed using the full learning dataset, overfitting is guaranteed. Overfitting means that the model has learned too much about the learning dataset and hence will not work well in new patients. Instead, the super learner simultaneously considers many alternative models prespecified with variable degree of flexibility. By repeated cross-validation, the super learner minimises overfitting hereby simulating the application of the candidate prediction models in new patients. The same applies to machine learning algorithms. A machine learning approach could be to include all predictor variables into a random forest. However, the forest requires the specification of hyperparameters (number of trees, terminal leaf size, etc) in a process called tuning. A tuning strategy works by selecting the constellation of hyperparameters with the highest cross-validation prediction performance among different tuning parameters values. Hence, tuning of a random forest is a special case of super learning. These machine learning algorithms can be included together with the regression models in a super learner meta-algorithm (fig 1).

A super learner as applied in the linked study. CSC and RSF indicate the learners used in the linked study (three for each are presented for the purpose of illustration): CSC1, CSC2, and CSC3 indicate cause specific Cox regression models with different predictor specifications (eg, with or without interactions, spline functions, stratification, etc); RSF1, RSF2, and RSF3 indicate random forests for survival data with different hyperparameters (eg, terminal leaf size, number of trees, etc). The super learner meta-algorithm uses cross-validation to rank the library of learners according to their prediction performance across 500 bootstrap splits of the learning data and chooses the best performing model for each outcome (discrete super learner). This process was done for the risk of kidney failure in the presence of the competing risk of death and for the risk of death from any cause. Numbers within the diamonds represent sequential prediction time horizons in years. The super learner can be used to obtain risk predictions for kidney failure and death at one to five years from diagnosis for an individual with known values of four or six predictor variables (http://kdpredict.com). CKD=chronic kidney disease
Cross-validation
A learner is an algorithm that takes in a learning dataset and returns a medical prediction model. The super learner uses cross-validation to rank alternative learners based on their prediction performance. Cross-validation randomly splits the learning data into separate sets and allows the learner to access only part of the data for training, while withholding the rest of the data for testing.2 There are different cross-validation methods, including k-fold cross-validation, bootstrapping with and without resampling (fig 1). In its simplest form, cross-validation splits the data into two parts, one time. Repeated random splitting removes the influence of the modeller.
The candidate learners
The modelling task consists of mapping the predictor variables to the predicted risk of the event of interest and the competing risks such that patients with similar values of the predictor variables, have similar expected outcomes. To accomplish this task, the modeller can choose freely between traditional regression modelling strategies4 and machine learning algorithms.5 For both approaches, sensible models are only expected when the modelling team has advanced knowledge about the clinical setting and expertise on the chosen type of analysis. A limitation present in the case of the linked study is that only few methods have been thoroughly tested and efficiently implemented for settings involving right censored data and competing risks.3 The linked study considered cause specific Cox regression models and random forests for competing risks.367 For the cause specific Cox regression models, the authors allowed variations in how the predictor variables entered into the linear predictors and the baseline hazard functions (ie, considering interaction terms, non-linear effects, and stratified baseline hazards). For the random forests, a suitable splitting rule for competing risks was considered, and different tuning parameters, including the size for the terminal leaves and number of trees.
How the super learner works
The ingredients of a super learner are the learning dataset, a series of candidate learners, a cross-validation algorithm, and a measure of prediction performance (box 1). In the linked study,3 the candidate learners were designed in an outcome blinded fashion using synthetic data that resembled the learning data (Alberta cohort) and the outcome replaced by random numbers. The cross-validation algorithm trained many models (20-80, depending on the number of predictors) in 500 subsets, each containing a random 63.2% of the learning data for model training and 36.8% for testing. In each testing subset, the learner performance was estimated by comparing the predicted risk to the actual outcome at each prediction time horizon and the best performing learner was selected (discrete super learner). The super learner is expected to be as accurate as the best candidate learner that is tested.1 An alternative super learner algorithm creates an ensemble by exploiting the discrepancies in risk prediction of a list of candidate learners and combining their risk predictions into a weighted average (ensemble super learner).2 Methods for ensemble learning are currently limited in settings involving right censored data and competing risks.
Key features of the super learner strategy
Provides a systematic method for making a prediction model
Combines traditional regression models with machine learning algorithms
Expected to perform as well as the best candidate learner
Competing interests: All authors have completed the ICMJE uniform disclosure form at www.icmje.org/disclosure-of-interest/ and declare: no support from any organisation for the submitted work; no financial relationships with any organisations that might have an interest in the submitted work in the previous three years; no other relationships or activities that could appear to have influenced the submitted work.
Funding: PR held Canadian Institutes for Health Research funding (FRN 173359) to support studies in chronic kidney disease and was supported by the Baay Chair in Kidney Research at the University of Calgary.
Provenance and peer review: Commissioned; not externally peer reviewed.
Footnotes
This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.