Part 2: Enhancing EHR Data Quality: Key Findings from Recent Studies

Mar 6, 2026

—

in AI Governance, Data Curation, Data Quality Control, Health Data

This post is Part 2 of our 2-part series on recent research studies looking at EHR data quality. Part 1 presented 3 studies from 2025, and Part 2 presents 2 more recent studies and a proposed technological shift that avoids the pitfalls of traditional data pipelines.

Study 4. Interoperability Remains a Pain Point

While previous papers focused on missing data or statistical imputation, this study by Everson et al. (2025) highlights a different dimension of data quality: Interoperability and Usability. The authors surveyed nearly 5,000 family physicians to understand how data from outside sources (other hospitals or clinics) integrates into their Electronic Health Record (EHR). The paper argues that the mere “movement” of data is not enough. For data to have high quality in a clinical sense, it must be integrated—meaning it is presented in a way that matches the clinician’s workflow without requiring them to hunt through a “data dump.”

Everson et al. (2025) show that family physicians still struggle with:

Missing records
Inconsistent formats
Failed data exchanges

Despite years of policy efforts, clinicians still lack complete patient information at the point of care.

The Solution: When data is moved without a quality check, it results in physician burnout and “information silos.” A tool that visits data before it reaches the physician can transform raw, fragmented data into actionable insights, ensuring that the “clinician’s screen” contains only what is necessary, accurate, and easy to find.

Study 5. EHR Incompleteness Is a Workflow Failure

This 2025 paper from the University of Central Florida shifts the perspective from technical data errors to process-driven incompleteness. The authors argue that missing data in EHRs is a systematic failure caused by gaps in human workflows, patient engagement, and institutional policy such as:

Fragmented workflows
Documentation optimized for billing, not care
Overloaded clinicians
Poorly aligned incentives

This is a process problem, not a software glitch.
Key issues identified include:

The “Data Swamp” Risk: As health systems rely more on AI and machine learning, “invisible” missingness propagates bias and leads to inaccurate clinical decision-making.
The 30–40% Gap: In some clinical settings, nearly half of the required variables (especially social determinants of health and lab data) are missing, yet researchers often treat the “data lake” as if it is complete.
The Process Paradox: Digital records were meant to improve accessibility, but poor usability and “provider fatigue” often lead clinicians to skip fields, creating a record that is structurally present but clinically hollow.

The Solution: A “360-Degree” View of Incompleteness

The authors propose moving away from just “fixing” data after it is collected. Instead, they suggest using a taxonomy of incompleteness and tools like the Record Strength Score to measure how “healthy” a patient’s data is before it is used for downstream analytics.

Why This Matters for the Future of Healthcare AI

The 2025 research consensus is clear:
EHR data quality is the single biggest bottleneck to trustworthy healthcare AI.

If:

30% of data is invalid
18% is missing
66% of diagnoses are misclassified
Interoperability gaps persist

…then no amount of downstream cleaning will save us.

We need upstream quality control and we need it before data moves.

FAIRlyz and similar data‑visiting tools represent a shift from reactive cleanup to proactive governance. They act as first responders, diagnosing and triaging data quality issues at the source.

This is how we build a future where healthcare AI is safe, reliable, and grounded in reality.

EHR Healthcare information silos Interoperability Missing Data missingness physician burnout provider fatigue