Patient Selection Algorithm Development for Cardiac Resynchronization Therapy

Improving treatment outcomes through comprehensive patient selection

The Challenge

Cardiac Resynchronization Therapy (CRT) is a novel treatment modality for patients suffering from moderate and severe heart failure caused by left bundle branch block (LBBB), an abnormality of the heart's electrical system, in which the left ventricle's contraction is delayed and therefore systolic function is reduced, leading to the hallmark reduced ejection fraction (<35%) of heart failure. Similar to a pacemaker and often also including a cardioverter-defibrillator, CRT involves the implantation of an electrical device that uses electrical pulses to synchronize the function of the two ventricles. For appropriate patients, CRT can be a life-saving therapeutic option. However, the intervention itself has considerable risks, carries significant expense and its benefits to heart failure patients are limited to a small group of suitable patients. Given these issues, patient selection is paramount. To facilitate that, Starschema has been commissioned by a major university hospital's interventional cardiology service to perform an analysis of its CRT program, which has been running for 12 years at the time, and determine patient selection criteria to identify patients most likely to benefit from CRT and least likely to suffer perioperative or postoperative complications.

One of the most significant challenges was the diversity of data available on each patient, ranging from quantitative information (e.g. lab tests) through information extracted from operative reports and EHR (Electronic Health Record) systems, ECG recordings (including long-term Holter monitoring), prescription information and determination of ejection fraction by cardiac MRI to echocardiograms. In addition to this, where mortality has occurred, cause of death and circumstances surrounding mortality were comprehensively encoded by expert clinicians using ICD-10, and mortality events have been classified as heart failure related, device-related (e.g. postoperative infections) or unrelated.

Our Approach

To manage this diverse array of data, a data lake was constructed that could accommodate structured, unstructured and binary data alike. This enabled our data scientists to rapidly and efficiently access all data assets pertaining to a patient regardless of format.

Using advanced feature engineering techniques, data was standardized and encoded in a format that could then be used in a multifactorial survival model. For ECG data, dynamic time warping was used to align individual beats and the coefficients of Daubechies wavelet transforms were used, while echocardiograms were analyzed using a deep autoencoder. With the vast amount of time series data, including several ECG recordings each of thousands of patients, we relied on automated feature engineering using deep feature synthesis (DFS), which allowed us to generate and select synthetic features from large amounts of data for optimum information content and representativeness.

Finally, a survival model was built that stratified risk of perioperative, postoperative cardiac and postoperative non-cardiac death. This highlighted the most determinative factors of survival, including obvious ones (such as age, physical status and comorbidities) and less obvious ones (such as QRS complex width and the presence of atrial fibrillation). This allowed the creation of a patient selection scoring algorithm that leveraged the LIME (Local Interpretable Model-Agnostic Explanations) model to not merely indicate to the physician that a particular patient was assigned a particular score, but also what values that score is based on. In the end, decisions of patient selection are made by clinicians, and with decisions that have such a vast effect on individual lives, the appropriate role of machine learning is to distill a vast majority of clinical information and advise the clinician of the factors that militate in favor and against the intervention.


The objective of this project was to create a clinically useful tool that could distill a vast amount of patient information of often different sparsity and provide clinicians with a pre-implantation likelihood of perioperative mortality and long-term survival. The clinical reception of a tool allowing reasoned decision-making on the basis of data and rapidly summarizing a patient's entire record into a single predictive model was overwhelmingly positive.

When compared to clinical trials of CRT, such as the COMPANION, CARE-HF and the MIRACLE/MIRACLE-ICD trials, the outcomes were in near-complete agreement with the model. However, unlike the clinical algorithms and patient selection guidelines laid down by these studies, the model we provided did not merely calculate suitability, but also explained why a patient would be suitable or unsuitable for CRT and quantified the relative weight of each of those factors. Guidelines and clinical algorithms treat factors usually as being of equal weight, whereas a quantitative model can assist the clinician to make better decisions by also highlighting the relative impact of each factor on outcomes within the population.
Accurate patient selection improves outcomes, prevents inappropriate treatment and saves lives. By creating a transparent and easily interpretable predictor, clinicians can now make treatment decisions with greater confidence and weigh the factors for and against a particular intervention in view of an intuitive explanation of relative weights and contributions.

Technologies used

● Python
● scikit-learn
● TensorFlow
● Deep Feature Synthesis
● LIME (Local Interpretable Model-Agnostic Explanations)

Skills used

● Clinical analytics and population health
● Deep Feature Synthesis and feature engineering
● Machine learning
● Model explanation and analysis

“Why Did This Happen?” New Horizons in Root Cause Analysis

Learn about core concepts of root cause analysis, the advantages and disadvantages of the most popular tools and techniques in the field and find out what the cutting-edge looks like.

Telco Location Data Monetization

A global telecommunications company opened a new revenue stream and made it profitable in just two years.

Automating BI Analytical Tasks with Anomaly Detection and NLG Summation

Learn how to design and implement a complex solution that automatically identifies anomalies in organizational data, provides relevant context and communicates it all in an easy-to-consume form to augment analysts' work.

Effective Location Data Monetization: Strategic and Technical Enablers

Geolocation data provides invaluable insights into the habits and preferences of users, customers and audiences. This white paper helps understand the fundamental opportunities and challenges inherent in using location data for business-critical processes in any industry.