Part 1: Key Takeaways from the RDA Plenary on AI/ML and Interoperability

Nov 20, 2024

—

in AI, AI Governance, Data Commons, Data Visitation, FAIR Data Principles, Interoperability

The 23rd RDA Plenary Meeting in San José, Costa Rica, brought together a global community of researchers to address the pressing issue of sustainable science. The plenary was divided into many sessions that ran simultaneously and were organized by RDA working groups (WGs) and interest groups (IGs). The event offered valuable insights and sparked innovative ideas around data sharing and reuse of scientific data. This is the first part of a two-part series on key insights from the RDA Plenary. You may access Part 2 here. In Part 1, we’ll focus on AI/ML and interoperability, highlighting their relevance to FAIRLYZ, particularly the Data Visitation feature.

Health Data Commons GORC Profile

The “Health Data Commons GORC profile” session delved into the creation of national health data ecosystems that can harness the power of AI/ML and existing data flows. The concept of “Data Meshes,” exemplified by NIH’s NCPI, was introduced as a network of interconnected data commons, repositories, and cloud resources. While the Ten Pillars for Data Meshes provide a solid foundation for data access, reuse, and analysis, they do not explicitly address the long-term storage and versioning of reused data. FAIRLYZ support for FAIR-ifying and versioning “Reused Data” may fill this gap by providing a mechanism to track and manage the lifecycle of reused data.

The Ten Pillars are divided into two groups:

Data Platform Interoperability:

PIDs
FAIR API for metadata
FAIR API for data
API for AuthN/Z
SAFE Analysis

Data Mesh Core Principles:

Mesh governance
Mesh minimum metadata
PIDs for mesh objects
Mesh usage statistics
API for data mesh metadata

AI Governance and Data Visitation

The session titled “The global evolution of AI governance and the relevance of the AIDV-WG’s deliverables” discussed AI implementations using data visitation (AIDV) and was organized by the RDA AIDV WG . Data visitation refers to a process described as:

Data visitation: Data sets are subject to analysis within a host location without the data ever leaving the host location; the analytical framework can be submitted by a third party external to the host location; and the results can be returned to that third party.

AI applications rely on large volumes of data, necessitating the need for secure environments where this data can be processed without compromising privacy or security. FAIRLYZ has incorporated data visitation processes to ensure that sensitive data remains within a secure environment. The RDA AIDV WG recommendations open new avenues for further regulatory advances. This session also looked at the EU ArtificiaI Intelligence Act (AIA) and governance requirements, the European Health Data Space (EHDS) Regulation which aim to enhance primary use of health data (continuity of care) and facilitate secondary use of health data for foreseen purposes (See EHDS Use case published in 2024). It also explored national AI policies in key regions like North America, Oceania, and South America, highlighting the global impact of AI regulations. Questions that were asked but do not have an answer yet and are crucial for data access for AI:

Are the AIDV RDA recommendations compatible with the latest regulatory developments?
What is the meaning of “adoption of recommendations” in the context of a constantly evolving regulatory landscape?
How to balance flexibility in recommendations with practical usefulness?
How to design Secured Processing Environments?

To Whom Does the “I” in FAIR Belong?

The session titled “To whom does the “I” in FAIR belong?” investigated how to address the “I” in FAIR, specifically in the context of interoperability of Research Commons. The session was led by members of the FAIR Data Maturity Model WG and the GORC groups. Following the talk, discussions with the organizers explored the potential of integrating FAIRLYZ scoring and AI-driven annotations as recommendations for interoperability.

Who cares about the Uptake of Digital Research Infrastructures?

The session titled “Who cares about the Uptake of Digital Research Infrastructures?” discussed the relevance of data commons and who is investing in them and why. It also considered whether digital research infrastructures (DRI) are fundamental to research efforts. One question to add would be “Is interoperability of DRI a deciding factor?”. Understanding the uptake of DRIs, such as FAIRLYZ – by whom, how and for what purposes such infrastructure is used and why it may not be used – is key for understanding user needs, barriers and opportunities.

The Repercussions for an Institution of a Research Data Commons

The “Repercussions for an Institution of a Research Data Commons” session provided an example by the Research Data Infrastructure Services at Princeton University to bring a Data Commons mindset to local data stores using TigerData. TigerData is a data management service at Princeton University which seeks to satisfy the needs of diverse stakeholders: Faculty, Dean for Research, Library, IT, Research Administration, Data Office. In the TigerData system, projects are the fundamental unit. This is similar to “studies” in FAIRLYZ which supports aggregating studies into projects. The larger vision is to use the RDA GORC International Model to develop a Princeton data commons that interoperates with other data commons. Some of the challenges faced by TigerData are being addressed in FAIRLYZ, such as: Normalizing data management best practices, incentivizing metadata enrichment, and interoperability.

In the next part of this series, we’ll delve deeper into specific use cases and explore how FAIRLYZ can leverage these insights to provide even more value to its users. Stay tuned!

AI AI governance Data Commons Data Visitation Interoperability ML RDA Research data Alliance