Data science in medicine

Background

My research in data science focuses on the development of predictive algorithms for patient outcomes in a variety of cardiac surgical environments. These algorithms are intended to be combined/ingested into my macro-scale engineering research towards the development of a "smart artificial heart" with real-time response and predictive capabilities.

Cardiac surgery outcome predictions

Patient history is one of the most important pieces of data used by cardiac surgeons in order to determine the best course of action. These records are stored in the Society of Thoracic Surgeons (STS) National Database, which contains 3 decades of longitudinal data on patient outcomes at various stages of cardiac surgical procedures. The high dimensionality (~1800 raw features) of the feature-space necessitates informative pooling strategies in order to be analyzed by medical professionals, and there is a standard statistical metric called the "STS score" that is used by physicians to make critical decisions. This STS score uses cohort-based Baysian inference methods to predict a patient outcomes for 8 cohorts corresponding to 8 different surgery types.

Statistical measures like the STS score play a critical role in life-or-death situations, like whether-or-not a surgeon decides to risk operating on a patient as well as economic implications. Maximizing their predictive power is therefore of the utmost importance. However predicting outcomes in cardiac surgery is challenging due to the abundance of incomplete information and imbalanced classes. In a forthcoming publication I explore the potential for a improving predictions for cardiac surgical outcomes using cross-cohort correlations as well as participant-based graphical models.

Below are some preliminary plots of AUC scores and Precision/Recall curves using cross-cohort correlations with missing data imputed using standard techniques (e.g. averaging, matrix decomposition techniques such as SVD). Using the simplest boosted tree algorithms we can already achieve predictivity at or surpassing the level of the STS score.

TRAINING

The richness of this data set can be further exploited by studying the causal relationship between the different variables. Causal structure learning algorithms are generically incomplete due to Markov equivalence and irreducible latent variable effects. Since the data is partially time-ordered via pre-operative/operative/post-operative classification, we can apply these constraints to disambiguate common causes and greatly simplify the graph complexity. Below we implement the path-condition (PC) algorithm on the most significant variables from each time step where the thickness of the line is proportional to the conditional mutual information between the two variables. We are currently in the process of experimenting with various machine-learning methods for modeling latent variables.

ML-infused all-in-one control system for artificial heart

Functional medical devices such as pacemaker, artificial pumps (LVAD, RVAD, BiVAD, ECMO), and monitoring sensors (flow, pressure, blood gas) continuously generate both bio-signals (e.g. blood pressure, oxygen saturation) and electric signals (e.g. pump speed, power consumptions) associated with patient physiological conditions and system performance, but with no prevention indicators available. There exist numerous high-quality resources for health care data including clinical data (e.g. adult cardiac surgery databases) and administrative data (e.g. a track record of healthcare workers). However, most of these resources are currently being used for observations and reference materials.

In a work-in-progress, I am currently exploring the potential for machine-learning algorithms to automate the real-time prediction and control of cardiac devices during anomalous events with longer lead-times. End-stage heart failure has few options besides heart transplant which in turn is limited by donor shortage. Left ventricular assist device (LVAD) has shown great promise for both bridge-to-transplant and destination therapy as an alternative to heart transplant. However, the occurrence of repeated infection, frequent readmission, and suboptimal quality of life due to percutaneous driveline tethered to power such device and unreliable hemodynamic monitoring and management that lead to disease progression such as congestive heart failure with fluid overload or right ventricular (RV) dysfunction remain the Achilles heel of LVAD therapy. This is due to the fact that once the pump is implanted, patients are sent back home with a pre-set pump speed without any physiological control or feedback. Physiological parameters such as ventricular chamber size, right ventricular function, blood pressure, and the water content in the lung change with the progress of heart failure. However, current LVAD has no mechanism to change the output based on those parameters. The autonomous powering and control of LVAD that operates in response to physiological needs can be a promising solution in which the predictive/prevention algorithm will be the key especially for preventing suction events before it happens.

In collaboration with the medical device company, Lynntech, Inc, we proposed to develop an all-in-one system which combines our wireless power transfer technology and the novel lightweight, durable Li-S battery (24 hours run time at a reduced footprint), which will significantly reduce technical issues that patients need to deal with and offer improved, long-lasting tether-free life as shown below.

Battery failure is reported in 15-25% of the patient population amongst the currently approved LVAD devices needing repeated change. Here, a predictive algorithm that can address the issue of premature battery failure with estimated battery lifetime in advance and incorporating this algorithm into the control/monitoring system will be the key to reduce unexpected visits to the hospital and improve patients’ quality of life.