After 9/11, health officials and the U.S. government became very concerned about the possibility of bioterror and interested in ways of identifying possible bioterror incidents early to react and contain them, says Parsa Mirhaji. "People were talking about all kinds of wild things like tracking fruit juice and handkerchief sales to identify a flu-like outbreak that can indicate a potential anthrax attack," says the assistant professor of medicine and the director of the Center for Biosecurity and Public Health Information Research at the University of Texas in Houston.
The problem was not a lack of data but almost too much. "We have information from hospital emergency departments, community clinics, pharmacy prescription drug sales, clinical laboratories, environmental safety commissions, pollution exposure in air and water." This is all complex data, and to do any good it needs to be analyzed quickly, first for normal patterns to provide a basis for comparison and then for deviations from those patterns that might indicate a natural disease outbreak or a bioterror incident.
This data comes from multiple sources and systems that use different, often incompatible schema. It comes in fast, in near real time, and to be of use, it needs to be analyzed very quickly. Conventional analysis methods simply are not as effective -- in an outbreak, officials depending on them would constantly be behind the spread of an infectious disease.
To solve that problem, Mirhaji's team turned to Semantic Web technology, a term that is beginning to appear in conversations among those working with very large amounts of data involved in scientific and medical research.
Web of meaning and connectivity
Semantic Web refers to the web of meaning and connectivity in large and complex data sets accessible from a distributed network such as World Wide Web, a network of collaborators and trustees or within boundaries of an organization. It's a way to organize complex data in meaningful ways by assigning a formal meaning to each element of data. This makes all data explicit, unambiguous and its interpretation identical for both machines and humans.
A simple example might be cell phone numbers. Presume, for instance, that a company issues cell phones with a consecutive set of numbers to all its salespeople. If you know this, then whenever you see one of the numbers in that set, you know that the person holding it is a sales professional who works for a specific company.
Suppose you are at a conference in Japan and someone gives you a business card written in Japanese, which you cannot read. But you see that the cell number is part of the series issued to this company's salespeople. Immediately, you know something about the person who just gave you his card, even if you cannot talk to him.
Semantic Web allows researchers to create applications that use the information tags to organize data from multiple sources into complex interrelationships. Based on this technology, Mirhaji's team developed a complex analysis engine called Situational Awareness and Preparedness for Public Health Incidents using Reasoning Engines (Sapphire).
Flu or pollution?
For example, Mirhaji says, several community hospitals in the Houston area send data on emergency room visits to his organization in near real time for analysis. Suppose some of those hospitals begin reporting an influx of patients complaining of respiratory distress on a Wednesday afternoon.
The Semantic Web analysis shows that these patients all live in the same area of the city, downwind of a particular oil refinery. The Semantic Web combines this data with data from the Texas Department of Environmental Quality showing that this refinery releases fine particulate matter that can cause respiratory issues and asthma attacks on Wednesday afternoons, making it probable that the influx of patients at the community hospitals is a reaction to air pollution rather than a flu outbreak.
Sapphire saw its test under fire two summers ago in the wake of Hurricane Katrina. Huge numbers of refugees from New Orleans were brought to Houston and housed in any available large spaces. Such large congregations of people, including many with compromised health conditions such as injuries, chronic diseases, stress-related compromised immune systems and possible infectious agents, are near-perfect environments for the spread of disease that then can infect local volunteers as well as other refugees, moving into the general city population as those volunteers bring the disease home.
The University of Texas established health clinics in the refugee areas and sent data from them to the center, but not everyone in need of medical attention came to those clinics. The center equipped researchers with handheld computers carrying a standard health and needs assessment survey, to report current physical and mental health status of all evacuees. By combining this information with the data from the clinics, the center created the only available comprehensive source of health information on the refugee population.
Fortunately, the evacuation didn't create any major and widespread disease outbreaks. Sapphire, however, was able to identify several respiratory and gastrointestinal infections early enough that health officials could head them off and prevent their spread beyond small initial populations.
Hurricane season
Since then, the Center for Biosecurity and Public Health Informatics Research has continued to develop Sapphire and add new sources of public health information, expanding its geographical coverage area. With another hurricane season upon the Gulf Coast, it's not hard to imagine that it may be put to practical use again.
In the meantime, Mirhaji says, Sapphire has become an important tool for analysis of many kinds of public health problems, from infectious disease, to the impact of air and water pollution on the city population, to social problems such as domestic violence.
Overall, he says, Sapphire development was greatly aided by the tools developed by the center's technology partners such as Oracle and TopQuadrant, and particularly by Top Braid Composer, a pioneer Semantic Web modeling and authoring tool.
Initially, Semantic Web modeling was based on Protege, an open-source and free tool developed at Stanford University. The center continues to use Protege to teach students the basic concepts of Semantic Web and modeling. However, "for actual production you need something better integrated to your programming language and test environment and [that] can support the life cycle of development," he says.
"Top Braid Composer lets us build models, test them in a production environment, redesign and redeploy, which we could not do with Protege. We have been very happy with TopQuadrant's tools, which really have moved Semantic Web from being a bleeding-edge academic concept to a practical reality."
Bert Latamore is a journalist with 10 years' experience in daily newspapers and 25 in the computer industry. He has written for several computer industry and consumer publications. He lives in Linden, Va., with his wife, two parrots and a cat.