Guest Column | May 31, 2016

Dissecting Data Infrastructures That Meet Healthcare's Analytics Needs

Healthcare Data

By John Schneider, CTO, Apixio

Business software systems still largely occupy a world that Oracle built, and those used in the healthcare industry are no different. In this world, there are lots of relational databases that are either transactional or configured as data warehouses. However, the world that artificial intelligence (AI), machine learning, and cognitive computing occupy looks a lot like the infrastructure of the web. In order to make healthcare data more usable in the AI, machine learning, and cognitive computing world, we need to transition to a different infrastructure.

Before jumping into the details of what all of this might look like, we need to ask why does the infrastructure matter and why do we want AI, machine learning, and cognitive computing?

Quite simply, these technologies work really well — especially for the needs of healthcare organizations. Given questions like, “What conditions is the doctor treating the patient for?” or, “What is the best course of therapy for a patient like this?,” systems can learn from data and be trained to produce extremely accurate answers. Given the high cost of healthcare and relative rarity of physicians and other highly trained individuals in the healthcare system, leveraging these systems can both improve throughput and improve quality of care in the healthcare system.

As for, “Why does infrastructure matter?,” it’s sort of like asking the question, “Why wasn’t the web built on the existing business infrastructure?” Well it did start out that way, and it didn’t work that well. Most of what we call Big Data today is technology and infrastructure developed by companies like Google, Yahoo, Facebook, and Amazon to solve for the problems they were facing, which were wildly heterogeneous data sources and largely unstructured data and volume.

This is the story playing out between EHRs, the data in them, and the demands on the data to enable the analytics we need. EHR systems are trying to solve these issues, but they are going to inevitably fail to provide the models and capabilities that will enable healthcare to take advantage of AI and machine learning to the same degree that other industries have. So we want and need software that can read the patient charts and answer the question, “Did this patient receive the standard of care called out for by quality metrics or not?” Answers to these questions matter to healthcare providers who are expected to provide this standard of care, and it matters to the patients who will get better care because they are no longer invisible. To get this, though, we need to build a parallel infrastructure that is capable of servicing these needs.

The basic infrastructure that can solve for this is going to look something like this:

  • At its center a patient centric model supported by clinical models (like HL7’s Clinical Document Architecture) and augmented by formal ontologies that describe things like diseases, treatments and drugs (UMLS, SNOMED, LOINC, ICD, HCC, RxNorm, etc., etc.).
  • This model will be highly structured, efficiently stored, permit additions and be directly addressable.
  • The infrastructure will have the ability to reason over an entire patient record and all of the entities it references.
  • The infrastructure will have the ability for programs to access data through application programming interfaces (APIs) or via high throughput computing infrastructures like Apache Hadoop.
  • The infrastructure will provide the ability to query over all patients as easily as one.

In the healthcare ecosystem, there are small islands of expertise sufficient to crack this problem in a local context, such as a single large provider or academic setting, but these are rare, costly teams to assemble and they lack large quantities of data because they service small populations of users. There are commercial entities that have access to larger data sets that are working to solve this problem. Their strategy is to provide Software as a Service (SaaS) and Platform as a Service (PaaS) products that are capable of ingesting clinical documents and building appropriate models to deliver specific solutions to support unique needs like risk adjustment and quality measures.

Healthcare hasn’t yet been able to take part in the data and analytics revolution that is transforming other industries. This doesn’t need to be the case though. It is possible to build infrastructure that can strengthen the healthcare industry as a whole. As the essayist William Gibson would say, “The future is already here, it’s just not evenly distributed yet.”