Case Study

How The Biocomplexity Institute Of Virginia Tech Is Leveraging High-Performance Analytics For Public Health

Predictive Analytics

By Harshal Shah, Senior Director, Healthcare, Persistent Systems

The recent outbreak of the Zika virus and its impact highlights the need for the world’s leading health organizations and governments to innovate using emerging technologies for tracking and preventing epidemics.

To accomplish this lofty goal, public health organizations need a system that can gather data in real-time to discover trends that could potentially turn into epidemics. The Biocomplexity Institute of Virginia is on the right track to meeting this challenge.

Tracking The Spread Of Epidemics In Real-Time
In 2005, The Network Dynamics and Simulation Science Laboratory (NDSSL) at the Biocomplexity Institute of Virginia Tech (Biocomplexity Institute) developed a workflow, based on graph-based modeling and high-performance computer simulation, to analyze and predict the spread of communicable diseases and the likely impact of containment steps. This system was called the Comprehensive National Incident Management System (CNIMS).

The typical process followed by the CNIMS before running simulations included the gathering of census data, land-use information and Graphical Information System (GIS) data. The information gathered comprised the synthetic population information used to model population densities and simulate human interaction patterns (and the transmission of disease) within cities, states, regions and countries.

Epidemiologists then developed algorithms based on this data to simulate epidemics. Due to the iterative, coding-intensive nature of the analytical work handled by the CNIMS, each simulation took a long time; sometimes the output of each job was rerun in an iterative, manual approach as many as 50 times.

In 2009, with the fast-paced outbreak of H1N1 flu, things changed. Leaders at the Biocomplexity Institute realized the limitations of the existing system in handling big data in real-time.

“We realized that we could not simply wait for information about infections from government agencies alone because that data comes in too slowly,” said Professor and Director Madhav Marathe, NDSSL. “We saw the need for real-time tools so we could give agencies some sense of what might have to be done as a disease spreads.” They also wanted to allow more number of users — technical and non-technical analysts — to access the information and analyses, both within and outside the Biocomplexity Institute.

The Biocomplexity Institute partnered with Persistent Systems to design a cloud-friendly middleware architecture comprising a message queue, scheduler and subscription system to allow organizations to submit new jobs easily and complete existing ones more collaboratively.

These recent advances in CNIMS middleware helped the Biocomplexity Institute deliver real-time actionable insights for tracking and containing the spread of epidemics. With machine-learning algorithms and statistical analysis of data loads, the Biocomplexity Institute has developed a dynamic set of rules that enables CNIMS to automate and make increasingly intelligent decisions about systems capacity and job scheduling to optimize the Biocomplexity Institute’s high-performance computing architecture.

Creating Access And Delivering Benefits To A Broad Set Of Users
To allow a much broader community of users to access this data and analyze simulations, the Institute developed two core graphical user interface applications – Synthetic Information-Based Epidemiological Laboratory (SIBEL) and EpiCaster. SIBEL is meant for advanced users such as researchers who design experiments and create epidemiological studies based on social network simulations, without the need for writing code and doing iterative analysis.

 “Using the graphical interface has made my life a lot easier. It used to take sometimes weeks to run up to ten simple simulations that now I can do in an afternoon,” said Brian Lewis, Computational Epidemiologist, Biocomplexity Institute.

EpiCaster, a visual disease tracking tool, was designed to be intuitive and enables access to the CNIMS for the public at large. According to Mandy Wilson, Software Development Lead, Biocomplexity Institute, “When users go into the application, they can drill down on geographical areas and see that Indiana, for example, is a hub of flu right now. They can also further drill down and see that Marion County in particular is a hotbed of flu right now.”

By engineering the CNIMS into a streamlined, reliable and resilient production system and transforming it into a more real-time capable system, the Biocomplexity Institute has increased job-processing throughput, data scale, accuracy and the flexibility to introduce new types of analyses.

According to Mandy Wilson, “When we are talking about millions and millions of rows, and the fact that we can clear that quickly and we can store that on a web interface for users to drill down, that is pretty impressive.”

The initiative undertaken at Virginia Tech Biocomplexity institute demonstrates how innovations in analytics are enabling more insightful understanding of disease control, and more timely advice to US governmental agencies and other groups combating the spread of these diseases.

Such analytics is democratizing access to information and enabling everyone to analyze simulations contextually — truly a major leap forward in containing epidemics in the future.

About The Author
Harshal Shah, Senior Director, Healthcare, Persistent Systems, has spent more than a decade in the healthcare industry, working to define strategies and design informatics solutions for digital health enterprises focused on next generation informatics.