Clinical Diagnostics | Industry Spotlights & Insight Articles

Data Collection, Image Modelling, and Spatial Analysis: Quality Control for Quality Data

The potential applications of deep learning tools in preliminary studies have developed along with their complexity in the past few years. Now, they could be used to help to refine a study or programme before it takes place in order to optimise its efficiency.

Data collection mechanisms in spatial bioinformatics offer a balanced, nuanced perspective on the functionality of biological data and assays. Our panel discussion on Spatial Bioinformatics at our Spatial Biology: UK event in April 2022 focused on some of the nuances of deep learning models and their applications in pioneering new research.

Marcus Tindall, Professor of Mathematical Biology and Head of the Department of Mathematics and Statistics at the University of Reading, explained that there has been a flow from experimental techniques to AI to mechanistic modelling in recent years. “Lots of new imaging techniques are evolving at the single-cell level,” he said, “so we can go between scales and use modelling as an informational tool.”

“You can’t really afford to do a lot of manual analysis and correction per sample,” said Martin Fergie, Lecturer in Healthcare Sciences at the University of Manchester. Fergie’s focus has been on looking at the use of the spatial information of tissue for predictive biomarkers, and more recently how this can be used to support clinical diagnostics. “How can you design these experiments to have statistically sound results for these applications?”, he asked.

For Oliver Braubach, Director of Applications at Akoya Biosciences, the focus is on developing an understanding of the questions desired for a particular study. “You can break down image analysis into three ways,” he said. “First, you obtain an image – that’s quality control,” he said. “Once you have quality control of your data, you can exploit and make use of algorithms – segmentation, clustered data and so on.” Finally, a patient population can be stratified from the processed data. This then drives a subsequent question: what is needed to improve analytical efficiency and overall throughput analysis?

Deep Learning Models and Algorithm Development

For Fergie, the main issue to address with deep learning models is the development and deployment of algorithms for image analysis in a way that provides confidence in the results derived. “We’ve been looking at large samples of tissue, using a TMA – a tissue microarray,” he explained. “We’ve got six or seven samples per patient on a slide, and then we might have around 100 patients in a cohort which is the minimum we can get away with in a survival analysis study.” Any analysis or manipulation needs to be applied consistently across the sample set. “We find that the order we apply the antibodies into the tissue can really matter,” Fergie added.

Another example Fergie gave was immunofluorescence. “A lot of the cohorts who work with retrospective human tissue process them in labs from hospitals all over the UK. Subtle changes in the fixation periods across lab steps results in significantly varied antigen binding across samples,” he explained. This makes many conventional bioinformatics techniques very difficult to apply, and care should be taken with data treatment to ensure its consistency.

However, Fergie suggested there were reasons to be optimistic regarding data collection models. “On the plus side, I think what we have seen is that if you’re willing to put in the effort to train deep learning models for image analysis, they can overcome a lot of these problems,” he said. Some of the advantages of deep learning include its versatility with real-world datasets, which can be computed and analysed much more efficiently than with real-world samples.

Holistic Functionality in Predictive Modelling

Tindall echoed Fergie’s thoughts on the matter, asserting that the holistic nature of the modelling process played an important part in its functionality and operation. “From a modelling point of view, I think it’s very important for everyone to realise that although we come from very different backgrounds we all rely on one another to answer the questions that are posed scientifically,” he said. “While people may see modelling as a mathematician’s role with lots of equations and maths, we use modelling to inform experimentation and data collection.”

As Tindall explained, data collection models can be conceptualised in several ways, and the potential applications are manifold. “You may have gone into your lab and collected data, and wished to understand how you can bring it together in an abstract model that allows the bringing together of in vitro information,” he continued. With advances in model validity, a new focus is on running preliminary models to anticipate the veracity of a study. “With colleagues from the drug industry, we’re beginning to turn towards using models that are developed almost before people are doing experiments to inform the direction of experimentation that people may wish to take.”

“We’re beginning to turn towards models that are developed before people are doing experiments to inform the direction that people may wish to take."

An example of this given by Tindall was modelling a molecular system of cells. “We may wish to model the coupling between the sub-cellular processes, and how that affects the environment around the cells,” he said. “In doing so we might go to the literature, or databases of public knowledge, and build a model to analyse in order to help us direct the data that we should be collecting.” Tindall described the overall process as an interdisciplinary interaction that required knowledge and insight from multiple facets. “I cannot work on my own – it’s a continual collective team effort.”

Data Collection and Image Modelling

One query posed by Braubach to Fergie was whether his model would be able to tell researchers in granular detail what they should be doing differently in their experiments to get different data. Tindall replied that variability for some systems can be built in at the cellular scale, and that the critical overriding question to address was making sure data collection is reproducible and fits with the model assumptions. He gave the example of a study into the spatial distribution of cells within a sample: instead of being evenly distributed within the medium, the cells were clumping together, hindering the research project. “That informs future experimentation, where you have to be careful around aggregation,” Tindall said.

“The machine learning approach to data interpretation is thinking about the framework and the parameters for optimisation.”

Mechanistic models factor less into the machine learning approach to data interpretation, which involves more observation and interpolation of data points. “The machine learning person’s approach to that problem is thinking about that framework, that parameter optimisation,” offered Fergie. A limitation of machine learning models is the way they map mathematical functions as abstract functions. “You can only really interpolate within the data you’re given, but as you do so there are lots of hyper-parameters you can choose to optimise within that experiment.”

Interpreting Data to Answer Clinical Questions

While obtaining data is a critical focus of all primary studies, interpreting it is an undertaking in and of itself. “Unfortunately, the reality of the lab is that there’s a lot of trial and error,” said Braubach, before posing the following question to the panel. “Both of you are very well-versed in the problems that come after acquiring the data – where should we start?” Fergie explained his approach was to start with the clinical question and then work backwards from there. “We get the data out of a panel – we find out each cell and each compartment and start to think about what the clinical questions are.”

Tindall said he was in complete agreement, adding that the final question should be the primary focus. “I would say pretty much almost 100% of the time, what we find during the modelling of systems is we confirm what people knew already.” He added that some of the most interesting applications of models came when they failed to reproduce the anticipated end result. “When you know that A produces B and that leads to C, you don’t need a model to explore that. But when you find things you didn’t expect, that’s when systems modelling is helpful for defining where you’re going to go next.”

As deep learning tools and mechanistic mathematical models advance in their complexity, there is the potential for their application in preliminary studies to inform the execution of the experiment proper when it takes place. Combining data collection, deep learning tools and mechanistic modelling, the future is full of promise: the successes witnessed in the field to date stand testament to the powers of collaboration across disciplines.

Want to learn more about next-generation modelling techniques and data synthesis? Head on over to our Omics portal to read about the latest in oncological research. If you'd like to register your interest in our upcoming Spatial Biology US: In-Person conference, visit our event website.