The digital age has ushered in an unprecedented era for scientific discovery, promising accelerated progress through open data sharing. Yet, beneath the surface of progressive policies and sophisticated technological solutions lies a potent psychological barrier: the fear of being “scooped.” This anxiety, deeply ingrained in the competitive landscape of research can be traced back to historical events in medicine, such as the discovery of the double helix DNA Structure. It often clashes directly with the growing mandates for data transparency, exemplified by journal requirements for Data Object Identifiers (DOIs) which are Persistent Identifiers (PIDs). The result? A paradoxical situation where researchers diligently archive their data with a persistent identifier, yet often provide minimal contextual information and rarely update it post-publication, effectively hindering the very collaborative spirit these policies aim to foster.
We found several examples of metadata in the Gene Expression Omnibus (GEO) and Sequence Read Archive (SRA) databases with insufficient information. The SRA metadata, which GEO also uses, have either the same information for all samples or no information on what distinguishes the samples.
As an example, you see that this SRA metadata has the same information for all samples. While not all columns are visible, the ones shown are representative of the data contained within the full table.

In comparison, the SRA metadata for a COVID-19 Saliva Microbiome study generated and submitted to SRA by Lifetime Omics, the company developing FAIRlyz, contains details about the subjects from which the samples were taken, making it more information-rich without compromising the subject’s identity. Researchers reusing such data are encouraged to register their study in FAIRlyz, providing credit to the original study and its authors through proper citation. This act of attribution is a crucial step in mitigating the fear of scooping and fostering a culture of data reuse.

Understanding this paradox requires a deeper look into the human psyche. The academic world thrives on originality and priority. The thrill of discovery, the prestige of publication, and the career advancements tied to groundbreaking findings fuel a natural protectiveness over one’s intellectual property. Sharing data prematurely, or even comprehensively upon publication, can feel like exposing nascent ideas to competitors eager to capitalize on them. This fear, while understandable, creates a significant friction point with the increasing calls for open science. That is why FAIRlyz uses data visitation for AI-driven data curation and QC while protecting sensitive information.
Adding to the data sharing tension are the well-intentioned policies of scientific journals. The requirement to assign a DOI to underlying research data is a crucial step towards making data findable, accessible, interoperable, and reusable (FAIR). However, the implementation often falls short. Driven by the aforementioned fear or simply overwhelmed by time constraints and the immediate pressures of publication, researchers may deposit their data with the bare minimum of metadata – enough to satisfy the journal’s requirement, but insufficient for others to truly understand, replicate, or build upon the work. Furthermore, the dynamic nature of research often means datasets evolve, analyses are refined, and new insights emerge. Yet, the initial DOI record, once minted, frequently remains static, a snapshot in time that doesn’t reflect the ongoing trajectory of the research.
In the next blog post, we will discuss new technological developments that challenge the DOI/PID requirement as a sufficient or even relevant step for data sharing. Lifetime Omics is aware of these challenges, is implementing into FAIRlyz intelligent technological solutions and revised policies that go beyond the minting of DOIs/PIDs to accelerate the pace of discovery by ensuring comprehensive metadata information and dynamic data records.