Meeting new challenges in data science for health research and innovation

Considerations being made around expanding the capabilities of iCARE and WSIC

Sponsored by snowflake-logo-color-rgb@2x

Written by iCARE colour

NHS NWL ICB New Logo

OneLondon_Logo_OneLondon_Logo_Blue

Over the past few years, the iCARE Secure Data Environment, including Imperial data and Whole Systems Integrated Care (WSIC) data, has continued to evolve, expanding its capabilities to support modern analytics, machine learning, and artificial intelligence. This evolution has been driven by an increasing demand for more data, more advanced tools, and greater collaboration across multiple research domains. The journey of WSIC and iCARE to date has laid the foundations for a scaling of this model across the London region. We are all now founding partners in the One London Sub-National Secure Data Environment, and the blueprint created over the past few years will enable the pan-London architecture to bring researchers to data. This will provide an unprecedented dataset for more than 10 million citizens, linking primary and secondary healthcare data for use in research.

A significant shift has been the growing need for high-end machine learning and AI-driven analytics. To accommodate this, the infrastructure can be scaled up or down to provide high-performance compute and GPUs, and access to an ever-increasing number of AI tools (including large language models) via new Azure and Snowflake tools. The number of databases integrated into the system has also expanded significantly, supporting not just research in north west London but also national and international studies, including rare disease research.

However, success has brought new challenges. The appetite for data has grown exponentially, with researchers seeking access to a broader range of datasets, including imaging, omics, biomarkers, and digital pathology. The sheer scale of data curation required for these efforts has underscored the need for a sustainable model – both financially and environmentally. The traditional approach of relying on a growing team of data engineers creates challenges for continuous scalability, prompting efforts to automate data ingestion and management processes.

One major initiative has been the investment in a data ingestion framework, allowing data to be loaded more efficiently without extensive manual intervention. By leveraging automation, the team has been able to streamline processes, improving efficiency and productivity. Additionally, advancements in cost tracking and reporting have provided greater transparency into resource allocation, helping to optimise storage and processing expenditures.

Beyond infrastructure, the focus is shifting towards federated analysis, ensuring that secure data environments across the UK can link and analyse data without unnecessary duplication. Rather than moving large datasets between institutions, the goal is to develop architectures that enable researchers to run algorithms across multiple locations while preserving data security and integrity. This approach will not only enhance research collaboration but also address growing concerns about sustainability in data storage and processing.

Another crucial aspect of this evolution is the safe return of insights to clinical practice. From the outset, the iCARE Secure Data Environment was designed to allow for SAFE-return where research findings could lead to direct clinical interventions. This has enabled the development of clinical decision support tools, such as automated alerts for prescribing risks and predictive models for patient deterioration. Importantly, mechanisms are in place to evaluate these interventions continuously, ensuring that AI-driven models remain effective, unbiased, and aligned with real-world clinical workflows.

As AI continues to evolve, so too must the governance frameworks supporting it. Unlike traditional medical devices, AI models require ongoing validation and oversight, as their outputs can shift over time based on new data inputs. This necessitates a continuous evaluation loop, monitoring how algorithms perform in practice, identifying potential biases, and assessing their actual impact on patient outcomes. The ability to track clinician interactions with dashboards and alerts has been instrumental in understanding user behaviour and refining AI-driven interventions.

Finally, the dual approach of using both identifiable and de-identified data has reinforced the effectiveness of iCARE and WSIC. Clinicians use identifiable data for direct patient interventions, while researchers analyse de-identified data to assess the broader impact of these interventions. This full-loop methodology ensures that innovations are both rigorously evaluated and safely reintroduced into clinical workflows, ultimately driving better patient outcomes and operational efficiency.

The story of iCARE and WSIC is far from over, and with ongoing advancements in automation, federated analysis, and AI-driven decision support, the focus remains on continuous improvement. By maintaining a balance between security, research access, and clinical application, the iCARE SDE will continue to serve as a model for integrating data science into healthcare in a responsible and impactful way.

Infrastructure support for this research was provided by the National Institute for Health Research (NIHR) Imperial Biomedical Research Centre (NIHR203323).