If anybody has big data, it's the federal government. Data about government grants. Records of payments to Medicare providers. Information about workers' compensation claims. Financial data on public companies. Demographic data from the U.S. Census.
The possibilities for combining various data in ways that could streamline processes and save taxpayer money seem endless. And yet federal agencies have just started scratching the surface of what may be possible.
There are analytics projects scattered throughout the federal government, but they are limited in scope, and the lessons learned are rarely shared with other agencies. (For the reasons why, see Wanted (desperately): Standards for government data.)
Nevertheless, some federal agencies are forging ahead with internal analytics projects. So far, the priority is trying to prevent fraud and improper payments -- when the government mistakenly pays too much or pays for something that it should not have.
That's not surprising, given the emphasis that the Obama administration has put on reducing government fraud and waste see sidebar, below, according to a recent report (pdf) on government analytics projects by the Association of Government Accountants (AGA).
Although targeted at different types of fraud, the agencies are taking a similar approach: developing models that can identify abnormalities and flag potentially fraudulent claims before they're paid, rather than the traditional "pay and chase" method in which audits identify fraud only after the fact. As potential fraud is identified and confirmed, that information is fed back into the system, providing more detail that further fine-tunes the predictive algorithms.
Shutting down stimulus fraud
One high-profile project was developed by the Recovery Accountability and Transparency Board (RATB). The RATB was created by the American Recovery and Reinvestment Act (ARRA) of 2009, better known as the stimulus program.
Its mission is twofold: to publish how, when and where some $800 billion in stimulus funds are being spent (which it does on Recovery.gov) and to prevent fraud and improper payments of the $283 billion reserved for contracts, grants and loans.
That second mission is where the predictive analytics comes in. The board established a data analytics team, called the Recovery Operations Center (ROC), which designed a system to compare the information reported by grant recipients against more than 22 different data sets.
Some are government data, such as lists of organizations that have been suspended from government contracts or debarred from doing business with the government because of problems ranging from fraud to making false statements to poor performance.
Other data sets are commercial or open source, such as data from Dun & Bradstreet, Lexis-Nexis, GPS sources and even information from social media such as Facebook and Twitter, says Mike Wood, executive director of the RATB.
"We weren't looking to develop just another 'pay and chase' oversight program that would detect fraud after all the money went out the door," Wood says. The focus was "to prevent criminals and other bad actors from ever getting their hands on the recovery funds."