Linked datasets need not be the preserve of the rich

No comments

Linking data can offer unbridled benefits to healthcare systems, but at what financial cost? Richard Wood explores how in one healthcare system this has been achieved in little time and with a low price tag.

Population Health Management is increasingly gaining attention and anyone with some awareness of PHM will know that linked data is the key enabler. While the precise definition of PHM is a little unclear (perhaps best the topic of another article), it broadly concerns the shifting of perspective from thinking “in silos” to a wider lens; understanding patient need and activity along various pathways and across distinct segments of the population.

Colossal data challenges

Acquiring a linked dataset had been a desire in the Bristol, North Somerset and South Gloucestershire system since we first performed a decision tree segmentation of secondary care spend on age, sex and deprivation index back in summer 2017. While these were the only variables available to us at the time, it surfaced the value of segmentation and the need to gain greater coverage of the activity footprint and the base of explanatory patient attributes if we were to better understand our population and their needs.

But any discernible movement towards a linked dataset was dissuaded by what appeared to be the colossal challenges involved. Many systems before us had racked up costs of up to and above a million pounds in programmes spanning multiple years.

The five-fold principles

With renewed determination, our journey to linked data began in earnest in December 2018. Eight months later in August 2019 we had gone live with the BNSSG system-wide dataset, costing under six figures and bringing together primary, secondary, mental health, and community data at patient-level across the one million population.

The achievement of this was through five guiding principles:

First, keep it simple. There is no denying that the vast linkages of efforts such as the Kent Integrated Dataset are powerful, but for the main staples of PHM – segmentation and risk stratification – all that is really needed is data relating to the attributes of individuals and any related healthcare activity.

If the resulting dataset is to be truly system-wide then it needs the input from all providers. As well as sourcing the data, this means sharing practical information around clinical coding conventions and things to look out for

The BNSSG system-wide dataset has just two tables corresponding to these dimensions. The “attributes table” simply represents individuals (in rows) and charts their demographic, socioeconomic, and clinical information (in columns). The “activity table” accounts for distinct patient-service contacts with fields specifying date(s), specialty, point of delivery, provider, and cost.

Second, engage system partners. If the resulting dataset is to be truly system-wide then it needs the input from all providers. As well as sourcing the data, this means sharing practical information around clinical coding conventions and things to look out for.

The focus should be on primary care data. Not just out of recognition of its pivotal importance (this is where all the patient attributes come from) but because of the information governance and logistical complexities in arranging this crucial piece of the puzzle.

In the BNSSG system we were fortunate to work from the beginning with our GP consortium One Care that was able to pull together primary care data from across the patch and help frame the engagement with GPs on what was a new and unfamiliar endeavour.

Third, make use of national flows. Acute SUS (Secondary Uses Service) data goes back many years, but more recently NHS Digital has made available minimum datasets for mental health and community care. All of these come with a unique identifier (pseudonymised NHS number) that can be used to link one another at patient level. Such datasets – which can typically be provided under CSU SLAs – bypass the need to set up local flows which require information governance and contractual overheads as well as bespoke file transfer mechanisms, and thus can keep costs down.

Fourth, don’t get hung up on data quality. Sure good quality data is important, but don’t let the search for perfection jeopardise findings that can still be useful. Many insights have a shelf life and there will be a need to provide findings early on in order to promote interest and secure buy-in to further work (appropriately caveating known issues, of course).

Fifth, capture hearts and minds. When all parts of a system come together this can prove far more effective than what can be achieved through money alone, especially if it is not totally clear what any money is being thrown at. But getting over a consistent vision is not always straightforward.

Financial and practical reach

While there are plenty of examples of linked data outputs, stakeholders love to see analysis based on their own data. In the BNSSG system we were able to use the results of a small pilot study – involving the linking of primary and secondary care data from five GP practices – to showcase the benefits and catalyse interest early on.

Concluding, linked data can be a fantastic resource. Since developing our dataset we have unearthed numerous findings of interest: We know that 1 per cent of our population consume as much urgent care resource as the remaining 99 per cent; we know the duration and composition of system activity along common clinical pathways; and we know how the profile of long-term conditions evolves as we age and how this differs by locality.

Our rapid progress has been recognised centrally with our admission to the second wave of the national PHM Programme. Our experiences show that, with the right will and determination, linked data can be within financial and practical reach of any system, and need not be the preserve of the rich.