Penn Medicine’s big data system triggers early detection of life-threatening infections

Insights drawn from a massively parallel computer cluster loaded with vast amounts of data help doctors develop new approaches to patient care

Healthcare organizations are latching on to big data for everything from population health management to genomic research. For Penn Medicine, the healthcare system and medical school affiliated with the University of Pennsylvania in Philadelphia, those technology advances are touching patients in very real ways.

A team of clinicians and medical informatics experts at Penn Medicine is using big data techniques to power innovations in clinical quality improvement, genomic research, data visualization, diagnostic apps and even the study of social media.

This big data approach has already produced one significant success: The Penn team has improved the ability of clinicians to predict which patients are at risk of developing sepsis, a life-threatening complication of an infection. Those patients can now be identified 24 hours earlier than before the algorithm was introduced.

Led by Mike Draugelis, chief data scientist at Penn Medicine, the team uses insights drawn from a massively parallel computer cluster that stores a huge volume of data to build prototypes of new care pathways. Those pathways are then tested with patients and the results fed back into algorithms so that the computer can learn from its mistakes. Penn Medicine, which opened the nation's first school of medicine in 1765, is among the winners of IDGE's Digital Edge 25 awards for its big data project.

Draugelis and his colleagues work in the Hospital of the University of Pennsylvania. On the academic research side, the university's medical school has launched an Institute for Biomedical Informatics (IBI) to conduct basic research using big data techniques. Announced in 2013, IBI is now coming together a few months after naming Jason Moore, who founded a similar institute at Dartmouth, as its director. IBI will focus its efforts on precision medicine, a hot field that is starting to take off as genomic sequencing costs drop.

The effort to link genomic differences with "phenotypes" — the variations in patients' characteristics and diseases — has been underway for five years, notes William Hanson, M.D., who is chief medical information officer and vice president of Penn Medicine, and serves as a member of IBI. But he says he sees this kind of research quickly accelerating. (To hear more about the UPenn project, check out this video interview with Hanson and Glenn Fala, senior director of software development at Penn Medicine Information Services. The video also appears at the top of this story.)

Steven Steinhubl, M.D., director of digital medicine at the Scripps Translational Science Institute (STSI) in La Jolla, Calif., agrees. "We're still on the rising part of the curve of what we're going to learn from big data," he says. "It's rapidly growing, but it will accelerate even more as large medical centers like UPenn take advantage of the data they're already collecting and add genomics on top of that."

Changing clinical pathways

Draugelis' team at Penn Medicine is using algorithms to tweak the guidelines that doctors and nurses follow in diagnosing and treating particular conditions. When a protocol changes, Draugelis explains, the clinical team must develop a new care pathway that specifies each step in the clinicians' workflow. It's intensive work, and so is coding the algorithm changes to adjust to feedback from the frontline of patient care.

The team builds a prototype of a new pathway for a particular condition about once every six months. Currently, it is focusing on finding a better way to predict which patients have congestive heart failure and which are likely to be readmitted after discharge from the hospital. In addition, the team is working on care pathways for acute conditions such as maternal deterioration after delivery and severe sepsis.

"We're creating machine-learning predictive models based on thousands of variables," Draugelis says. "We look at them in real time, but we train them up over millions of patient records."

The care pathways team uses a NoSQL back end "that allows us to process the data through a pipeline, to use clinical notes, vital signs, labs and imaging data from the [electronic health record (EHR)] in a way that can be quantified," Draugelis says. "A lot of work has to be applied to this variety of data to push it through to the algorithm that will provide the insights."

Draugelis emphasizes the importance of working closely with clinical teams to integrate the big data insights into the care process. "We're working in two-week sprints, where the clinicians adjust their pathways, and we adjust the algorithms to their needs," he says. "That feedback is really important as you explore the new solutions. A black box from a vendor can be disruptive or dangerous, because the answers may be not as focused as expected, and they're not going to fit the pathway that currently exists."

Humans plus computers

Dean Sittig, a professor at the University of Texas Health Science Center, School of Biomedical Informatics, in Houston, says he likes the idea of continuous monitoring and feeding data into computer algorithms because the computer can be more vigilant than a hospital nurse who is caring for five patients at any given time. To make the decision-support alerts useful, however, the staff has to be ready to spring into action, especially with a condition like sepsis, Sittig says.

The alerts that the algorithm triggers must also be fairly accurate. "As a rule of thumb, if the computer is right more than half the time — especially with something serious like sepsis — clinicians will pay attention to it. But if it's only right 10% of the time, it starts to be a bother," he says.

Neither humans nor computers can be trusted entirely, Sittig says, but a human plus a computer ought to be better than a human alone. "Current big data algorithms are not trying to replace humans, they're trying to augment humans. If you have a well-trained staff that believes in the value of the computer, that team can be very effective, and I think we can improve the quality and the safety of healthcare."

Precision medicine

Two important developments have come together to make possible the kind of precision medicine research being done at Penn Medicine's IBI. First, EHRs have become widespread in the past few years: Most hospitals and more than 80% of physicians now have these systems. Second, the cost of genomic sequencing has dropped to around $1,000 for a complete genome. The cost of partial genome or exome sequencing is less than that. As a result of these trends, the idea of correlating genotypic and phenotypic variants to discover individual responses to diseases and drugs is now feasible.

To perform this kind of research, Penn Medicine has created a specialized "bio-bank" that, so far, has stored about 20,000 genomic samples with patients' permission, says Brian Wells, associate vice president of health technology and academic computing for the healthcare system. A separate center for personalized diagnostics has sequenced tumor genomes for more than 5,000 patients, he notes.

The sheer volume of genomic data is staggering. For example, Penn Medicine has two petabytes of disk space in its high-performing computer cluster, and it plans to expand that, says Wells.

"One researcher told us that in the next few years, he might go from five to 30 petabytes of space related to neuroscience sequencing. So we're prepared to add to that as we need to," he notes.

Social media experiments

Penn is tackling several other dimensions of big data in its research efforts. For example, one of its teams used natural language processing (NLP), a form of machine learning that seeks to understand human language, to capture negative emotions in Twitter tweets. The team is looking for correlations between the proportion of negative tweets in a community and mortality data for that community.

Big data is also expected to discover new correlations among many different kinds of data on a large scale. IBI will capitalize on that potential by using NLP to parse "hundreds of millions" of healthcare documents that contain unstructured data, Wells says. Moreover, IBI has nearly completed a "visualization lab" that will allow researchers to display and analyze information from large, multidimensional data sets in new ways, he notes. The hope is that they will be able to see valuable connections that no one realized were there.

STSI's Steinhubl says such correlations could help lead to breakthroughs. Today, he notes, physicians are puzzled by "outliers," such as people who survive a dangerous cancer or get heart disease at an early age and live for 50 more years. "Once you have large data sets that allow you to look at entire healthcare systems or payer databases, you can begin to identify groups of these outliers. These things will help us learn a lot about mechanisms of disease."

Tide of data coming

Much more data is coming to feed the big data machines. The information in EHRs is just the beginning, says Penn Medicine's Hanson. "As you span the continuum of medical data, EHR data is pretty sparse. But physiologic data gets to be potentially richer, and genomic data is quite rich."

The physiologic data he refers to includes the data from mobile devices and wearable sensors that will soon be pouring into healthcare providers' databases. But that data will have to be prescreened to be usable in patient care, he cautions.

Steinhubl is very excited about the promise of big data in fields like precision medicine and clinical quality improvement. "Eventually, it's going to completely change medicine and the way we treat common chronic conditions," he says.

For example, he notes, most cases of hypertension are defined as a single disease. "So we put them all in one basket and treat them the same way. With these tools, we'll be able to refine their phenotype and their genotype, and better treat these individuals. Right now, it's mostly trial and error," says Steinhubl.

While oncologists are increasingly using information about the differences among individual cancer patients, it will be a while, Hanson says, before this approach filters down to primary care physicians. However, precision medicine research is moving fast at Penn Medicine and other leading academic medical centers. "We're on the verge of an explosive development," he says.

Copyright © 2015 IDG Communications, Inc.

It’s time to break the ChatGPT habit
Shop Tech Products at Amazon