FAIR Data Management: What are the Best Approaches to Ensuring Information Findability?
Alfred Stefan, Manager of LU Informatics at AbbVie, is focused on what he terms the ‘data lifecycle’. Describing common approaches to information capture strategy, he said many companies become ‘data messy’. “Our approach is to store all data regardless of how relevant it becomes over the years,” explained Stefan. This ‘store-all’ mentality can result in vast, disorganised data banks which are difficult to navigate and categorise appropriately. “If we look back over five to ten years of data, we can find information which is no longer useful or relevant.” FAIR data management principles exist as a means to address issues with information storage and accessibility.
“We become data messy... it's extremely difficult to delete data, and over time the system gets more and more crowded.”
The December 2022 PharmaTec Discussion Group addressed best-practice approaches to optimising data infrastructure, as implementing a new approach to data infrastructure when dealing with a pre-existing backlog of files and information is — at best — complicated. “It’s extremely difficult to delete data,” added Stefan, “and what happens is that over time the system gets more and more crowded.” If this is clearly identified as a problem, the challenge then becomes deciding what to do with the old data. As the volume of information captured and stored as a by-product of research processes continues to grow, implementing a robust information handling framework is crucial.
Making Data Findable Through FAIR Data Management
To kick off the discussion, Stefan asked the audience what they believed was the main benefit of FAIR data. Susanna-Assunta Sansone, Professor of Data Readiness at the University of Oxford, has worked around data interoperability and reproducibility since 2001. “Good data management involves making data usable at scale,” remarked Sansone. “That’s what it really is, it’s nothing new but it involves pulling together people from different sectors on the importance of good research and data management.” Stefan agreed, adding that in his interpretation the principle involved presenting data so that the metadata was understandable.
- Read: Oxford Global's R&D Top 40 Stories
- Federated Machine Learning in Biobanks
- Implementing Real-World Data for Real-World Solutions
In terms of making data FAIR across the board, a major step involves making sure the data environment is safe for everyone to work with. For companies looking to adopt new approaches, incorporating established FAIR data principles to assert some control over the data would be a big help. In this context, the concept of FAIRificaion is analogous to a library in enabling users and interested parties to find relevant data.
FAIRing Clinical and Pre-Clinical Data: Optimising for the Data Lifecycle
Regarding the proper filing and stratification of data, proper cataloguing is essential to provide a reference point for users. “When we talk about data, do we speak about having all data or just certain areas?”, asked Stefan. “Have you catalogued clinical as well as non-clinical data?” One audience member replied that his company had catalogued the wider variety of data, but not necessarily all. “Each different area has its own definition of efficiency — it depends on keeping the solutions simple enough to adhere to the principles.”
Sansone added that implementing FAIR data management approaches proactively proved to be more feasible than implementing them retrospectively. “It is this process of good practice that many are focusing on,” she said. Stefan agreed, adding that another aspect of FAIR data involved acknowledging that not all data needed to be retained. “The question becomes why we should store a large amount of data if that data is never going to be accessed again.”
Another participant said there was an argument for retaining ‘data for data’s sake’, but said that a broader question for a company initiating FAIR data management principles was ‘where do we start?’. To implement FAIR data approaches, a good level of transparency and concision in communicating the initiative’s aims is key. By breaking up the journey towards FAIRer datasets into piecemeal steps, adopters have a clearer direction to pursue.
Optimising for Data Format and Presentation
A major consideration for any data sharing strategy is the format in which the information is sent. “Sharing might be with the next department, which might be physically close but using a different system,” said Stefan. The example he gave was data recorded in LIMS (laboratory information management systems) which was he had attempted to share with a different department. Both parties were interested in sharing the data but had not managed to access the LIMS in a structured way. “If you’re sharing data with a third party,” continued Stefan, “how do you make sure that data is accessible and interoperable?”
“You mentioned the cost,” said Sansone. “Based on our experience with people in pharma, rather than verifying data, they’re focusing on putting FAIRer practices in place.” This helps to demonstrate to the businesses that there is value in doing so. Having the correct framework in place is also valuable when it comes to handling legacy data: existing, potentially unstructured data in electric lab notebooks (ELNs) and proprietary databases. “New data generated today could become legacy data in one or two years,” said Stefan. “Exactly,” Sansone agreed. “You have to put in place the FAIR data frameworks and strategies.”
Using FAIR Data Management Approaches to Navigate Large Datasets
Finding an approach to navigate the huge volumes of data archived is crucial to good data infrastructure –- a big emphasis is on choosing information and criteria which were relevant to the target datasets. The approach favoured by Stefan involves making newer, fresh datasets ‘linkable’, or available in conjunction. “Because we could not link legacy LIMS and ELN data together, we turned our focus to link LIMS and ELN data for new projects together.”
From a more holistic perspective, understanding how each dataset functions in conjunction is a key part of developing the bigger picture. Across a pharmaceutical organisation, any department that uses data is of interest: this is the crucible of the main justification for the implementation of a company-wide data FAIRing framework that facilitates the straightforward identification of desired information.
Want to take part in our next series of PharmaTec discussion groups? Register to become a member of our PharmaTec portal and gain access to our full range of editorial content, including the latest commentaries and insights from industry specialists. If you’d like to attend our upcoming PharmaData and Digital Medicine UK conference, visit our event website to download an agenda and register your interest.