Bumblekite MLSS 2022
Schedule
Learn more about our 2022 sessions, including a set of reading materials that are recommended to be read beforehand.
Technical note: Find out more about each session of the particular day below by clicking on the line of the day.
Full schedule
Aug 7th, Sun - clinical, static data
9:30 lecture, Mirabela Rusu
Title: Precision Integrative Medicine: Can AI models help radiologists in their image interpretation?
Abstract: The subtle difference in MRI appearance of prostate cancer and benign prostate tissue renders the interpretation of prostate MRI challenging, causing many false positives, false negatives, and wide variations in interpretation. My laboratory focuses on improving the interpretation of prostate MRI by developing deep learning models that automatically localize indolent and aggressive prostate cancers on MRI scans. The novelty of our methods comes from using whole-mount pathology images to label MRI images and to create pathomic MRI biomarkers of aggressive and indolent cancers. Our approach achieved an area under the receiver operator characteristics curve of 0.93 evaluated on a per-lesion basis and outperformed existing deep learning models. In patients outside our training cohorts, such predictive models will outline the extent of cancer on radiology images in the absence of pathology images, thus helping guide the prostate biopsy and local treatment.
The talk will focus on discussing recent contributions from my lab on registering whole-mount pathology images with MRI, training deep learning models to extract pathomic MRI biomarkers and using them in training deep learning models to detect and distinguish indolent and aggressive prostate cancers on MRI, and showing the benefits of using labels from pathology in training deep learning models to distinguish indolent from aggressive prostate cancer on MRI.
11:15 tutorial, Valeria De Luca
A brief introduction on how data science and machine learning can impact drug development will be given, with focus on translational and precision medicine and the challenges of working with health data. We will dive deeper into a real use case.
13:45 workshop, Francesca Sanna
Title: Draw me a Story – a love letter to visual storytelling
This introductory workshop will first explore how visual storytelling can be a tool for communication.
It will examine the key structures of a good story and analyse how the visual and written elements can have different roles in the narrative.
Finally there will be a collective experiment of visual storytelling.
14:45 leadership conversation, Miriam Donaldson, Danil Mikhailov, Jonas Dorn, Mirabela Rusu, Valeria De Luca
Topic: how to build, manage & nurture data science & engineering teams within different healthcare organisations with academic, industry & non-profit viewpoints
reading list
tutorial
Hartl, D., De Luca, V., Kostikova, A. et al. Translational precision medicine: an industry perspective. J Transl Med 19, 245 (2021).
Luo, H., Lee, P.-A., Clay, I., Jaggi, M., De Luca, V. Assessment of Fatigue Using Wearable Sensors: A Pilot Study. Digit Biomark (2020).
De Luca, V., Luo, H., Clay, I. Continuous multi-sensor wearable data and daily subject-reported fatigue of heathy adults. Zenodo (2020).
Dlima, S., Shevade, S., Menezes, S., Ganju, A. Digital Phenotyping in Health Using Machine Learning Approaches: Scoping Review. JMIR Bioinform Biotech (2022).
Aug 8th, Mon - time series, sensors data
9:30, 11:30 lecture & tutorial, Jonas Dorn
Why the pharma industry is so interested in remote digital monitoring, the value the data are expected to generate, the path to getting there, including the data analysis challenges that have to be overcome to turn data into insights that are robust enough to inform drug development decisions.
17:30 leadership conversation, Gorana Dasic, Shalini Trefzer
Topic: What considerations engineers need to keep in mind to enable data-driven insights within the area of medical affairs (e.g. real world drug performance, registries, adverse event predictions)?
reading list
Taylor, K.I., Staunton, H., Lipsmeier, F. et al. Outcome measures based on digital health technology sensor data: data- and patient-centric approaches. npj Digit. Med. 3, 97 (2020).
Karas, M., Bai, J., Strączkiewicz, M., Harezlak, J., Glynn, N. W., Harris, T., Zipunnikov, V., Crainiceanu, C., & Urbanek, J. K. Accelerometry data in health research: challenges and opportunities. Statistics in biosciences, 11(2), 210–237 (2019).
Aug 9th, Tue - engineering toolbox
9:30 lecture, Christian Holz
Title: Zero-effort Mobile Health for Precision Medicine
In this talk, we will walk through the challenges and opportunities for mobile health in the domain of diagnostics to collect and process representative data, captured from patients and everyday consumers "in the wild" and outside controlled scenarios.
11:30 tutorial, Alexander Marx
Title: Discovering Causal Graphs in Biomedical Data
It is well-known that correlation does not equal causation, but how can we infer causal relations from data? Ideally, we would like to conduct a randomized controlled trial to determine if, for example, smoking causes lounge cancer, or not. Often, such experiments are, however, too expensive or unethical. Thus, we need to rely on passively collected, so-called observational data. Learning causal graphs from observational data is called causal discovery.
In this tutorial, we will first discuss the most fundamental algorithms and assumptions of causal discovery. Subsequently, we will do a hands-on coding tutorial, in which we analyze a gene expression data set and play around with different causal discovery algorithms to get an impression about their strength and weaknesses.
15:15 tutorial, Ece Özkan Elsen
Title: Prediction of cardiac function on echocardiograms
Cardiovascular diseases are the leading cause of death worldwide, responsible for nearly one third of global casualties. Early detection of cardiac dysfunction through routine screening is vital, as appropriate treatment and behavioural changes can prevent premature death. Echocardiography is a popular diagnostic tool in cardiology, with ultrasound imaging being a low-cost, real-time and non-ionizing technology. Previous work has introduced a complex pipeline of 3-dimensional convolutional neural networks on B(rightness)-mode videos to predict cardiac function.
We will work on an alternative method for a reduced setup.
18:00 leadership conversation, Stefan E. Germann & Gorana Dasic
Topic: leadership in healthcare, consensus building, stakeholder management
reading list
tutorial
Ouyang, D., He, B., Ghorbani, A., Lungren, M.P., Ashley, E.A., Liang, D.H., Zou, J.Y. EchoNet-Dynamic: a Large New Cardiac Motion Video Data Resource for Medical Machine Learning. (2019).
Ouyang, D., He, B., Ghorbani, A. et al. Video-based AI for beat-to-beat assessment of cardiac function. Nature 580, 252–256 (2020).
Aug 10th, Wed - imaging, multimodal data
9:30, 11:30 lecture & tutorial, Farah Shamout
Title: Multi-modal Learning
Medical data is diverse and heterogeneous. Two of the most popular modalities for clinical prediction tasks are medical images and data extracted from electronic health records.
In this talk, recent advances in imaging and non-imaging applications, and the intersection of the two data modalities under the realm of multi-modal learning will be presented.
Aug 11th, Thu - genomics data
9:30 lecture, Sina Rüeger
Title: Applying Machine Learning to Genomic Data - Limits & Challenges
In this talk, we'll discuss
- how genomic data is organized and its implications for ML methods. In other words, why DNA lends itself to study causes of diseases (central dogma), how DNA is encoded in a dataset (as letters or numbers), the tradeoffs between genotyped and sequenced DNA (costs and precision), properties of genomic data and implications (high correlation between genetic variants, depends on population)
- which methods are used to analyse data. More precisely, we'll discuss why collaborations (and open data) with genomic studies are important (increase sample size, but limits statistical model), the implications of correlated genetic variants (makes interpretation hard), eurocentric studies (not everyone is represented in genetic studies, especially relevant for prediction)
- importance of data engineering / data organisation (accessibility even within a company is key).
11:30 tutorial, Kathleen Chen
Whole-genome deep learning analysis reveals causal role of noncoding mutations in diseases and traits. In this tutorial, I'd like to demonstrate an end-to-end workflow for developing a deep learning model to learn the regulatory activities of DNA and applying it to understand the regulatory impact of mutations in specific disease contexts.
- The first part of the tutorial will focus on how sequence models can be developed, trained, and evaluated on whole-genome sequencing data for chromatin marks, with opportunity for participants to experiment with architectures of their own design on small datasets.
- Then, participants may apply these models on datasets to discover potentially causal variants, develop their own methodology for prioritizing mutations for further validation, and the like.
17:30 leadership conversation, Natalie Banner
Topic: responsible & ethical use of health data
reading list
lecture
Uffelmann, E., Huang, Q.Q., Munung, N.S. et al. Genome-wide association studies. Nat Rev Methods Primers 1, 59 (2021).
Martin, A.R., Kanai, M., Kamatani, Y. et al. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat Genet 51, 584–591 (2019).
Wang, G., Sarkar, A., Carbonetto, P. Stephens, M. A simple new approach to variable selection in regression, with application to genetic fine mapping. J. R. Stat. Soc. B, 82: 1273-1300 (2020).
leadership conversation
Resources on UK public views about health data use, information resources and advice on 'trustworthiness', Understanding Patient Data.
A new community for those interested in the intersection between data science methods and issues of equity, reducing health inequalities, etc., Data Science for Health Equity.
An overview of the plans to explore the screening research potential of whole genome sequencing for newborn babies, Genomics England newborns programme.
Tsamados, A., Aggarwal, N., Cowls, J., Morley, J., Roberts, H. Taddeo, M., Floridi, L. The Ethics of Algorithms: Key Problems and Solutions. (2020).
An Incomplete History of Research Ethics.
Aug 12th, Fri - communication skills
9:30, 13:45 workshop, Mirna Šmidt
Title: Communications skills workshop
At this workshop, we will explore the foundations of what makes a great communication, in a practical, engaging way. You will leave enrich for practical tools to communicate more effectively, as well as with valuable insights into your own communication style.
16:15 leadership conversation, Stephen MacFeely & Laura Magdalena Locher
Topic: technology deployment in practice across the healthcare systems with a global & a local viewpoint
reading list
leadership conversation
Damouras, S., Gibbs, A., MacFeely, S. Training official statisticians for adaptive statistical practice. Statistical Journal of the IAOS. 37. (2021).