Part 1: Key Takeaways from RDA 23 on AI/ML and Interoperability

The 23rd RDA Plenary Meeting in San José, Costa Rica, brought together a global community of researchers to address the pressing issue of sustainable science. The plenary was divided into many sessions that ran simultaneously and were organized by RDA working groups (WGs) and interest groups (IGs). The event offered valuable insights and sparked innovative ideas around data sharing and reuse of scientific data. This is the first part of a two-part series on key insights from the RDA Plenary. We’ll focus on AI/ML and interoperability, highlighting their relevance to FAIRLYZ, particularly the Data Visitation feature.

Health Data Commons GORC Profile

The “Health Data Commons GORC profile” session delved into the creation of national health data ecosystems that can harness the power of AI/ML and existing data flows. The concept of “Data Meshes,” exemplified by NIH’s NCPI, was introduced as a network of interconnected data commons, repositories, and cloud resources. While the Ten Pillars for Data Meshes provide a solid foundation for data access, reuse, and analysis, they do not explicitly address the long-term storage and versioning of reused data. FAIRLYZ support for FAIR-ifying and versioning “Reused Data” may fill this gap by providing a mechanism to track and manage the lifecycle of reused data.

The Ten Pillars are divided into two groups:

Data Platform Interoperability:

  1. PIDs
  2. FAIR API for metadata
  3. FAIR API for data
  4. API for AuthN/Z
  5. SAFE Analysis

Data Mesh Core Principles:

  • Mesh governance
  • Mesh minimum metadata
  • PIDs for mesh objects
  • Mesh usage statistics
  • API for data mesh metadata

AI Governance and Data Visitation

The session titled “The global evolution of AI governance and the relevance of the AIDV-WG’s deliverables” discussed AI implementations using data visitation (AIDV) and was organized by the RDA AIDV WG .  Data visitation refers to a process described as:

AI applications rely on large volumes of data, necessitating the need for secure environments where this data can be processed without compromising privacy or security. FAIRLYZ has incorporated data visitation processes to ensure that sensitive data remains within a secure environment. The RDA AIDV WG recommendations open new avenues for further regulatory advances. This session also looked at the EU ArtificiaI Intelligence Act (AIA) and governance requirements, the European Health Data Space (EHDS) Regulation which aim to enhance primary use of health data (continuity of care) and facilitate secondary use of health data for foreseen purposes (See EHDS Use case published in 2024). It also explored national AI policies in key regions like North America, Oceania, and South America, highlighting the global impact of AI regulations. Questions that were asked but do not have an answer yet and are crucial for data access for AI:

  • Are the AIDV RDA recommendations compatible with the latest regulatory developments?
  • What is the meaning of “adoption of recommendations” in the context of a constantly evolving regulatory landscape?
  • How to balance flexibility in recommendations with practical usefulness?
  • How to design Secured Processing Environments?

To Whom Does the “I” in FAIR Belong?

The session titled “To whom does the “I” in FAIR belong?” investigated how to address the “I” in FAIR, specifically in the context of interoperability of Research Commons. The session was led by members of the FAIR Data Maturity Model WG and the GORC groups. Following the talk, discussions with the organizers explored the potential of integrating FAIRLYZ scoring and AI-driven annotations as recommendations for interoperability.

Who cares about the Uptake of Digital Research Infrastructures?

The session titled “Who cares about the Uptake of Digital Research Infrastructures?” discussed the relevance of data commons and who is investing in them and why. It also considered whether digital research infrastructures (DRI) are fundamental to research efforts. One question to add would be “Is interoperability of DRI a deciding factor?”. Understanding the uptake of DRIs, such as FAIRLYZ – by whom, how and for what purposes such infrastructure is used and why it may not be used – is key for understanding user needs, barriers and opportunities.

The Repercussions for an Institution of a Research Data Commons

The “Repercussions for an Institution of a Research Data Commons” session provided an example by the Research Data Infrastructure Services at Princeton University to bring a Data Commons mindset to local data stores using TigerData. TigerData is a data management service at Princeton University which seeks to satisfy the needs of diverse stakeholders: Faculty, Dean for Research, Library, IT, Research Administration, Data Office. In the TigerData system, projects are the fundamental unit. This is similar to “studies” in FAIRLYZ which supports aggregating studies into projects. The larger vision is to use the RDA GORC International Model to develop a Princeton data commons that interoperates with other data commons. Some of the challenges faced by TigerData are being addressed in FAIRLYZ, such as: Normalizing data management best practices, incentivizing metadata enrichment, and interoperability.

In the next part of this series, we’ll delve deeper into specific use cases and explore how FAIRLYZ can leverage these insights to provide even more value to its users. Stay tuned!