Data science to improve clinical practice and clinical trials
Advances in data science and data utilisation are key to improving clinical trials and real-world evidence through which medicines are regulated and optimised. The quest for inclusive trials seeks fair representation of patient groups across all lived-experiences – including people from low- and middle-income countries or communities – who may be more likely to have an earlier onset of a wide range of medical conditions and be more at risk of having multiple long-term conditions.
Electronic health records are a key data source for better understanding patients, both individually and within populations. As electronic health record data are better captured, linked and curated, opportunities to understand disease risks and trajectories improve.1-3 Such data also allow for optimised data processing that could be reused to improve clinical trials’ feasibility analyses, recruitment, safety surveillance, economic evaluations, generalisability studies and long-term outcomes surveillance.
Recent advances in causal machine learning have the potential to enhance patient care, public health measures, service quality management, planning and research – including for clinical trials. For example, machine learning approaches such as Convolutional Neural Networks (CNNs), Long Short-Term Memory networks (LSTMs), and Generative Adversarial Networks (GANs) are starting to enable innovations such as the estimation of treatment effects or the generation of synthetically balanced case-control populations and ‘virtual control groups’. By using machine learning methods (and using data from collected from past clinical trials, natural history studies, electronic health records, claims data, or disease registries) to create virtual control groups, we can move more of our study designs away from placebo control arms. With less dependence on human controls, more participants receive the innovative treatment rather than a placebo or standard care.