SageBionetworks, OregonHe
Abstract
Continuous and reliable access to curated biological data repositories is
indispensable for accelerating rigorous scientific inquiry and fostering
reproducible research. Centralized repositories, though widely used, are
vulnerable to single points of failure arising from cyberattacks, technical
faults, natural disasters, or funding and political uncertainties. This can
lead to widespread data unavailability, data loss, integrity compromises, and
substantial delays in critical research, ultimately impeding scientific
progress. Centralizing essential scientific resources in a single geopolitical
or institutional hub is inherently dangerous, as any disruption can paralyze
diverse ongoing research. The rapid acceleration of data generation, combined
with an increasingly volatile global landscape, necessitates a critical
re-evaluation of the sustainability of centralized models. Implementing
federated and decentralized architectures presents a compelling and
future-oriented pathway to substantially strengthen the resilience of
scientific data infrastructures, thereby mitigating vulnerabilities and
ensuring the long-term integrity of data. Here, we examine the structural
limitations of centralized repositories, evaluate federated and decentralized
models, and propose a hybrid framework for resilient, FAIR, and sustainable
scientific data stewardship. Such an approach offers a significant reduction in
exposure to governance instability, infrastructural fragility, and funding
volatility, and also fosters fairness and global accessibility. The future of
open science depends on integrating these complementary approaches to establish
a globally distributed, economically sustainable, and institutionally robust
infrastructure that safeguards scientific data as a public good, further
ensuring continued accessibility, interoperability, and preservation for
generations to come.
AI Insights - EOSC’s federated nodes already host 1 million genomes, a living model of distributed stewardship.
- ELIXIR’s COVID‑19 response proved community pipelines can scale to pandemic‑grade data volumes.
- The Global Biodata Coalition’s roadmap envisions a cross‑border mesh that outpaces single‑point failure risks.
- DeSci employs blockchain provenance to give researchers immutable audit trails for every dataset.
- NIH’s Final Data Policy now mandates FAIR compliance, nudging institutions toward hybrid decentralized architectures.
- DeSci still struggles with interoperability, as heterogeneous metadata schemas block seamless cross‑platform queries.
- Privacy‑by‑design in distributed repositories remains a top research gap, inviting novel cryptographic solutions.
Rochester Institute of T
Abstract
Qualitative research offers deep insights into human experiences, but its
processes, such as coding and thematic analysis, are time-intensive and
laborious. Recent advancements in qualitative data analysis (QDA) tools have
introduced AI capabilities, allowing researchers to handle large datasets and
automate labor-intensive tasks. However, qualitative researchers have expressed
concerns about AI's lack of contextual understanding and its potential to
overshadow the collaborative and interpretive nature of their work. This study
investigates researchers' preferences among three degrees of delegation of AI
in QDA (human-only, human-initiated, and AI-initiated coding) and explores
factors influencing these preferences. Through interviews with 16 qualitative
researchers, we identified efficiency, ownership, and trust as essential
factors in determining the desired degree of delegation. Our findings highlight
researchers' openness to AI as a supportive tool while emphasizing the
importance of human oversight and transparency in automation. Based on the
results, we discuss three factors of trust in AI for QDA and potential ways to
strengthen collaborative efforts in QDA and decrease bias during analysis.