
2025 · June
EHA 2025: AI-Augmented Fibrosis Grading Improves Pathologist Agreement in 1000-Patient Clinical Audit
Ground Truth Labs
Team
At EHA 2025 in Milan, we presented results from the largest clinical evaluation of AI-augmented bone marrow fibrosis grading to date. Across 1000 real-world cases reviewed by 15 international hematopathologists, our Continuous Index of Fibrosis (CIF) significantly improved agreement between pathologists—more than doubling the odds of consensus when used alongside manual assessment.
The reproducibility challenge
Fibrosis grading matters. It drives classification and prognosis in myeloproliferative neoplasms, where the distinction between grades can influence treatment decisions. But manual grading is subjective, and variability is a known problem.
Our study quantified just how variable. When the same pathologist graded the same case twice, they assigned a different grade roughly one in three times—intraobserver agreement was just 66%. If pathologists don't agree with themselves, inter-rater agreement will always be limited.
What we tested
Working with Oxford University Hospitals, we collected 1000 sequential bone marrow trephine biopsies—an unselected, real-world clinical cohort. Fifteen hematopathologists from NHS, US, and European centers participated, ranging from registrars to consultants.
Each pathologist graded cases under three conditions:
- Manual grading (standard practice)
- CIF Sequential Read (grade first, then second grade with CIF heatmap visible)
- CIF Concurrent Read (grade with CIF heatmap visible)
What we found
CIF heatmaps significantly improved both agreement and accuracy:
- Sequential read: OR 1.43 (95% CI 1.21–1.7, p=0.0001)
- Concurrent read: OR 2.41 (95% CI 2.02–2.88, p<0.0001)
The effect was consistent across experience levels. CIF-assisted grades were also closer to consensus, suggesting the AI augmentation improved accuracy, not just agreement for its own sake.
Why this matters
This study demonstrates that AI can improve reproducibility in a notoriously subjective assessment—without replacing pathologist judgment. The pathologist remains in control; CIF provides objective, quantitative support to inform their decision.
The scale matters too. Previous validation studies showed CIF correlates well with expert consensus. This study shows it works in clinical practice, across a large unselected cohort, with pathologists of varying experience.
What's next
These findings support broader clinical adoption of AI-augmented fibrosis assessment. We're continuing to work with clinical partners to integrate CIF into routine diagnostic workflows.