Abstract
Cardiac surgery has become an important medical intervention in the treatment
of end-stage cardiac diseases. Similar to many clinical domains, however, today
the field of cardiac surgery is under pressure: more and more patients are expected
to be treated with high-quality care within limited time and cost spans.
This has induced an increasing urge to evaluate and improve the efficiency and
quality of the delivered care. Research on predictive factors of clinical outcomes
(e.g., death) and the amount and duration of treatment is indispensable in this
respect. A common strategy to identify predictive factors is the development
of prognostic models from data. The resulting models can be used for risk assessment
and case load planning. Furthermore, the models form instruments
that can assist in the evaluation of care quality by adjusting raw outcomes for
case mix. The development of new prognostic methods using machine learning
methodology for cardiac surgery and postoperative intensive care is the topic of
this thesis.
Chapter 1 introduces the multidisciplinary care process of cardiac surgery and
presents the objectives of the thesis. The care process is roughly composed of
a preoperative stage of preassessment, a stage of the surgical intervention in
the operation room, and a postoperative stage of recovery at the intensive care
unit (ICU) and the nursing ward. With the introduction of modern clinical information
systems, large amounts of patient data are routinely recorded during
patient care, including data of the (cardiac) disease history of the patients, operative
details, and monitoring data. Moreover, clinical outcomes such as length
of stay and death are recorded in these systems. The information systems form
a new data source for development of prognostic models. Instruments that are
currently in the prognostic toolbox of clinicians and managers involved in cardiac
surgery are models that generally allow only preoperative risk assessment of
a single outcome variable; standard statistical methods (e.g., logistic regression
analysis) have been used for model development. The field of machine learning
offers methods for data modeling that are potentially suitable for development
of prognostic models for their graphical model representation. Tree models and
Bayesian networks are typical examples hereof; their graphical representation
may contribute to the interpretation of the models. The general objective of this
thesis is to employ and investigate these machine learning methods for modeling
data that are recorded during routine patient care, in order to extend the
practitioner’s prognostic toolbox. The project aims to provide a ‘proof of concept’
of the prognostic methods rather than delivering prognostic instruments
as clinical end products.
Chapter 2 presents the prognostic Bayesian network (PBN) as a new type of
prognostic model that builds on the Bayesian network methodology, and implements
a dynamic, process-oriented view on prognosis. In this model, the mutual
relationships between variables that come into play during subsequent stages of
the care process, including clinical outcomes, are modeled as a Bayesian network.
A procedure for learning PBNs from data is introduced that optimizes
performance of the network’s primary task, outcome prediction, and exploits
the temporal structure of the health care process being modeled. Furthermore,
it adequately handles the fact that patients may die during the intervention
and ‘drop out’ of the process; this phenomenon is represented in the network by
subsidiary outcome variables. In the procedure, the structure of the Bayesian
network is induced from the data by selecting, for each network variable, the best
predictive feature subset of the other variables. For that purpose, local supervised
learning models are recursively learned in a top-down approach, starting
at the outcome variable of the health care process. Each set of selected features
is used as the set of parent nodes of the corresponding variable, and represented
as such with incoming arcs in a graph. Application of the procedure yields a
directed acyclic graph as graphical part of the network, and a collection of local
predictive models as the numerical part; they jointly constitute the PBN.
In contrast to traditional prognostic models, PBNs explicate the scenarios that
lead to disease outcomes, and can be used to update predictions when new information
becomes available. Moreover, they can be used for what-if scenario
analysis to identify critical events to account for during patient care, and risk
factor analysis to examine which variables are important predictors of these
events. In order to support their use in clinical practice, PBNs are proposed
to be embedded in a prognostic system with a three-tiered architecture. In the
architecture, a PBN is supplemented with a task layer that translates the user’s
prognostic information needs to probabilistic inference queries for the network,
and a presentation layer that presents the aggregated results of the inference to
the user.
An application of the proposed PBN, the learning procedure, and the threetiered
prognostic system in cardiac surgery is presented in Chapter 3. The
learning procedures was applied to a data set of 6778 patients for development
of a PBN that includes 22 preoperative, operative, and postoperative variables.
Hospital mortality was used as outcome variable in the network, and operative
mortality and postoperative mortality as subsidiary outcome variables to represent
patient dropout. The method of class probability trees served as supervised
learning method for feature subset selection and induction of local predictive
models. The predictive performance of the resulting PBN was evaluated for a
number of complication and mortality variables on an independent set of 3336
patients for two prediction times: during the preoperative stage, and at ICU
admission. The results showed a good calibration for the variables that describe
ICU length of stay longer than 24h and the occurrence of cardiac complications,
but a poor calibration for the mortality variables; especially for these variables,
the predicted probabilities of the PBN were found to be underdispersed. The
mortality variables had best discrimination, though. In order to verify the effectiveness
of the dedicated PBN learning procedure, the performance results of
the PBN were compared to the predictive performance of a network that was induced
from the learning set using a standard network learning algorithm where
candidate networks are selected using the minimal description length (MDL)
principle. The PBN outperformed the MDL network for all variables at both
prediction times with respect to its discriminative ability. Similar calibration
results were observed for the MDL network, suggesting that the underdispersion
of predicted probabilities is directly related to the Bayesian network methodology.
The chapter concludes with presenting a prototype implementation of a
prognostic system that embeds the PBN, ProCarSur.
Prediction of the postoperative ICU length of stay (LOS) fulfils an important
role in identification of patients with a high risk for a slow and laborious recovery
process. Furthermore, it provides useful information for resource allocation
and case load planning. When developing predictive models for this outcome,
the prediction problem is frequently reduced to a two-class problem to estimate
a patient’s risk of a prolonged ICU LOS. The dichotomization threshold is often
chosen in an unsystematic manner prior to model development. In Chapter 4,
methodology is presented that extends existing procedures for predictive modeling
with optimization of the outcome definition for prognostic purposes. From
the range of possible threshold values, the value is chosen for which the corresponding
predictive model has maximal precision based on the data. The
MALOR performance statistic is proposed to compare the precision of models
for different dichotomizations of the outcome. Unlike other precision measures,
this statistic is insensitive to the prevalence of positive cases in a two-class prediction
problem, and therefore a suitable performance statistic to optimize the
outcome definition in the modeling process. We applied this procedure to data
from 2327 cardiac surgery patients who stayed at the ICU for at least one day
to build a model for prediction of the outcome ICU LOS after one day of stay.
The method of class probability trees was used for model development, and
model precision was assessed in comparison to predictions from tree ensembles.
Within the data set, the best model precision was found at a dichotomization
threshold of seven days. The value of the MALOR statistic for this threshold
was not statistically different than for the threshold of four days, which was
therefore also considered as a good candidate to dichotomize ICU LOS within
this patient group.
During a patient’s postoperative ICU stay, many physiological variables are
measured with high frequencies by monitoring systems and the resulting measurements
automatically recorded in information systems. The temporal structure
of these data requires application of dedicated machine learning methods.
A common strategy in prediction from temporal data is the extraction of relevant
meta features prior to the use of standard supervised learning methods.
This strategy involves the fundamental dilemma to what extent feature extraction
should be guided by domain knowledge, and to what extent it should be
guided by the available data. Chapter 5 presents an empirical comparison of
two temporal abstraction procedures that differ in this respect. The first procedure
derives meta features that are predefined using existing concepts from the
clinician’s language and form symbolic descriptions of the data. The second procedure
searches among a large set of numerical meta features number (summary
statistics) to discover those that have predictive value. The procedures were applied
to ICU monitoring data of 664 patients who underwent cardiac surgery to
estimate the risk of prolonged mechanical ventilation. The predictive value of
the features resulting from both procedures were systematically compared, and
based on each type of abstraction, a class probability tree model was developed.
The numerical meta features extracted by the second procedure were found to
be more informative than the symbolic meta features of the first procedure, and
a superior predictive performance was observed for the associated tree model.
The findings in this case study indicate that in prediction from monitoring data,
it is preferable to reserve a more important role for the available data in feature
extraction than using existing concepts from the medical language for this purpose.
Automatically recorded monitoring data often contain inaccurate and erroneous
measurements, or ‘artifacts’. Data artifacts hamper interpretation and analysis
of the data, as they do not reflect the true state of the patient. In the literature,
several methods have been described for filtering artifacts from ICU monitoring
data. These methods require however that a reference standard be available in
the form of a data sample where artifacts are marked by an experienced clinician.
Chapter 6 presents a study on the reliability of such reference standards
obtained from clinical experts and on its effect on the generalizability of the
resulting artifact filters. Individual judgments of four physicians, a majority
vote judgment, and a consensus judgment were obtained for 30 time series of
three monitoring variables: mean arterial blood pressure (ABPm), central venous
pressure (CVP), and heart rate (HR). The individual and joint judgments
were used to tune three existing automated filtering methods and to evaluate
the performance of the resulting filters. The results showed good agreement
among the physicians for the CVP data; low interrater agreement was observed
for the ABPm and HR data. Artifact filters for these two variables developed
using judgments of individual experts were found to moderately generalize to
new time series and other experts. An improved performance of the filters was
found for the three variable types when joint judgments were used for tuning
the filtering methods. These results indicate that reference standards obtained
from individual experts are less suitable for development and evaluation of ar
tifact filters for monitoring data than joint judgments.
A basic, and frequently applied, method for automated artifact detection is
moving median filtering. Furthermore, alternative methods such as ArtiDetect
described by C. Cao et al. and a tree induction method described by C.L. Tsien
et al. have been proposed in the literature for artifacts detection in ICU monitoring
data. Chapter 7 presents an empirical comparison of the performance of
filters developed using these three methods and a new method that combines
these three methods. The 30 ABPm, CVP, and HR time series were used for
filter development and evaluation; the consensus judgment of the time series obtained
from the four physicians was used as reference standard in this study. No
single method outperformed the others on all variables. For the ABPm series,
the highest sensitivity value was observed for ArtiDetect, while moving median
filtering had superior positive predictive value. All methods obtained satisfactory
results for the CVP data; high performance was observed for ArtiDetect
and the combined method both in terms of sensitivity and positive predictive
value. The combined method performed better than the other methods for the
HR data. Because of the large differences between variables, it is advised to employ
a well-chosen inductive bias when choosing an artifact detection method
for a given variable, i.e., a bias that fits the variable’s characteristics and the
corresponding types of artifact.
The principal findings of this thesis are summarized and discussed in Chapter
8. The thesis primarily contributes to adapting machine learning methods
to the induction of prognostic models from routinely recorded data in contemporary
cardiac surgery and postoperative intensive care. Notwithstanding the
graphical representation of Bayesian networks, the interpretation of the cardiac
surgical PBN was experienced to be difficult (Chapter 3). In addition, tree models
were observed to be somewhat misleading: they may not reveal all factors in
the data that are important for the prediction problem at hand. A persistent
problem turned out to be the incorporation of domain knowledge into machine
learning methods: knowledge appeared to be not readily available for prognostic
problems in cardiac surgery. Moreover, the formats in which knowledge is
represented in existing methods were found to be not always appropriate for
prognosis. These findings are clearly illustrated in the study on feature extraction
from ICU monitoring data (Chapter 5). Furthermore, generally agreed
knowledge on artifact measurements in monitoring data appeared to be limitedly
available, and employing opinions of individual experts in modeling was
found to highly affect the generalizability of the resulting models (Chapter 6).
Future steps to come from a ‘proof of concept’ of the presented methods to
reliable prognostic instruments for clinical practice involve model development
from multicenter data sets that include the relevant patient and process variables,
and their implementation in clinical practice. Finally, evaluation studies
will be necessary to assess the actual benefit of the instruments in supporting
clinical staff and management for evaluation and improvement of the efficiency
and quality of patient care.
Original language | English |
---|---|
Qualification | Doctor of Philosophy |
Awarding Institution |
|
Supervisors/Advisors |
|
Award date | 28 Nov 2007 |
Place of Publication | Eindhoven |
Publisher | |
Print ISBNs | 978-90-6464-179-4 |
DOIs | |
Publication status | Published - 2007 |