TDWI Articles

Overcoming the Hurdles of Health Data Analytics

What's holding us back from analyzing more healthcare data?

Chronic diseases account for more than half of all deaths around the world and they are the single largest cause of rising healthcare spending. Leveraging population health analytics tools can provide unmatched insight into understanding high-risk patient populations. By aggregating data on costs, quality, and efficiency measures across multiple sources, we can identify gaps in care. We can analyze which population segments are at risk of developing chronic diseases. We can determine why some of these patients are getting readmitted 30 days after discharge from the hospital.

With so much to gain, what is holding us back from performing such analytics, and how do we get past these hurdles?

Fundamentally there have been three major barriers hindering meaningful analytics of health information.

Barrier #1: Lack of adoption of electronic health record systems

Several surveys about adoption of electronic health record (EHR) systems have shown that although EHR adoption has doubled in the last decade, there is still a long way to go. There is some good news. Several countries (including Australia, Canada, Denmark, Finland, UAE, Sweden, Estonia, and the Netherlands) are making rapid strides in the move towards value-based care.

In Denmark, about 98 percent of primary care physicians have access to the centralized electronic records of the citizens, including all hospital physicians and all pharmacists! The U.S. government has launched several programs (such as EHR certification and incentive payments) to drive adoption of EHRs that have helped increase the adoption rate. However, much still needs to be done to get to 100 percent adoption rate, both in the U.S. and across the world.

Barrier #2: Lack of interoperability

Developing the right healthcare big data analytics infrastructure -- that embraces the most forward-thinking work on EHR interoperability and health data standards -- is one of the most challenging pieces of the puzzle and one that continues to have providers worried. At the same time, there is increasing pressure to share more health care information across organizations and disciplines via mobile and cloud-based applications, and to share information at a faster pace, desiring integration in a matter of days rather than months or years.

Existing standards (such as CDA and HL7 v3) attempted to address these challenges but were hard to implement. The need to share data in real time and provide an alternative to the document-centric approach to data sharing led to the genesis of Fast Health Interoperable Resource (FHIR). This new and important health standard is rapidly changing the technology trend.

FHIR builds on previous data standards (such as CDA and HL7) but uses a modern, Web-based suite of API technology, including a HTTP-based RESTful protocol, HTML and Cascading Style Sheets for user interface integration, a choice of JSON or XML for data representation, OAuth for authorization, and Atom for results. FHIR facilitates integration between legacy healthcare systems and makes it easy to provide healthcare information to providers and individuals on a wide variety of devices.

It also allows third-party application developers to provide medical applications that can be integrated into existing systems. This integrated data can be streamed into a data store where it can be correlated with other informatics data. This could open up possibilities such as performing epidemic tracking, analyzing emergency waiting room timings, understanding trends and many more.

FHIR holds the promise of making seamless population healthcare a reality -- several public health agencies are already embracing its potential.

Barrier #3: Challenges in health big data and analytics

Once we have aggregated data from the actual care of millions of patients, we turn our attention to gaining new medical knowledge by leveraging big data analytics. With the right analytics solutions, we can analyze vast amounts of incredibly complex data across multiple sources to produce clear, human-powered insights providers can act on.

Big health data is comprised of claims data, clinical data, genomic data, and self-quantified patient data. Of these, claims data is the most reliable and readily available and hence most analytics experience is with this data. However, it has only moderate influence in predictive analytics and lacks the clinical details. Genomic data, on the other hand, has not been as readily available but is very reliable and can have a strong impact on predictive analytics.

There are four secondary challenges to using this data for analytics, but advances in analytics are mitigating each one.

Data may not be usable for analytics or its use may not be efficient. EHR systems can sometimes lead to public safety issues owing to improper use or poor design of these systems (for example, in one case records from a poorly designed EHR system had a Type II diabetes marker tagged to the child instead of the mother and the EHR system did not have the intelligence to flag the error).

Analytics has helped solve some of these issues. Intelligent health record systems have been developed that establish links between the patient's conditions, observations, medications, interventions, and patient goals (for example, hypertension (condition) is related to high blood pressure (observation)).

Another method is integrating some form of decision support software into clinical systems. These could facilitate drug interaction checking and prescription safety checks, suggest commonly missed diagnostic data interpretations, and perform patient surveillance for early warning of deteriorating patient health.

Lack of process- and workflow-improvement initiatives in healthcare. Unlike the manufacturing industry where sensor data from multiple sources is brought together to understand the process on the floor, healthcare does not have sensor data in many places, so studying the healthcare process proves to be a challenge.

Analytics can help address this challenge and provide recommendations to optimize the data. For example, a hospital may want to study the gynecologic oncology process to determine conformance to standards. The process typically involves multiple departments such as gynecology and radiology as well as several labs.

The event data in this case is used to arrive at a process model that is analyzed. Such process analysis takes the idea of process mining from a static diagram (model-based analysis) to being able to openly visualize it using dynamic data in ways that help people understand their current care processes, how they compare to some established standard, and ultimately how to create better care processes.

Privacy and security issues. Health data is classified as either protected health information (fully identified, cannot be shared by HIPAA laws in the U.S.), de-identified health information (cannot be traced back to a specific patient), or synthetic health information (statistically generated data). Patients are always concerned about the security of their health data, so how do we leverage or exchange this data for performing analysis?

Analytics tools have been developed to transform personal health information data into de-identified data using expert determination methods. These applications develop various masked scenarios using statistical or scientific principles so that the expert can choose the de-identification approach that offers a good balance between the risks of disclosure and the utility of the data.

Another issue is obtaining patient consent to the use of their data in an electronic health exchange. In the opt-in model, patients must agree to have their data shared in a health information exchange. In the opt-out model, the default is reversed, and the data is shared unless the patient specifically says they don't want it to be. Many countries (such as Australia) are moving from an opt-in model to an opt-out model.

Data quality. A variety of data quality issues inhibit analytics, including heterogeneous data, missing or inaccurate values, dynamic/evolving data, and inaccurate or missing timestamps. One common issue of missing or inaccurate values at a U.S.-based provider was resolved using an analytics tool that provided accurate problem lists and suggestions and learned about relationships. The relationship information was captured as a FHIR resource and leveraged in the intelligent analytics system.

A Final Word

The niche is open now and both payers and providers are working together with data scientists and vendor communities to address these challenges so that population health management can truly flourish.

For more information:

In Denmark's Electronic Health Records Program, A Lesson for the U.S.

FHIR Overview

Quick Stats (Data visualizations of key data and statistics provide quick access to the latest facts and figures about health IT)

Anderson G and Horvath J 2004. The Growing Burden of Chronic Disease in America. Public Health Reports, May–June, 119:263-270.

TDWI Membership

Accelerate Your Projects,
and Your Career

TDWI Members have access to exclusive research reports, publications, communities and training.

Individual, Student, and Team memberships available.