Anticipation Game

Text mining and real-time applications have improved the accuracy and timeliness of predictive analytics, making it a better bet for businesses.

In the movie Minority Report, Tom Cruise's character relies on visions from "precogs," people who can predict crimes, to catch criminals before they can act. While the film takes place in the future, the predictive analytics tool sets available to businesses today are bringing similar scenarios to life.

For example, LoanPerformance uses such tools to help its clients predict which of their customers will be late with payments, which will be lying when they say the check is in the mail and which will be likely to default altogether. The San Francisco-based firm operates a cooperative database of loan payment information for financial institutions. Richard Harmon, senior vice president of scoring and analytic services at LoanPerformance, says its customers, which include mortgage servicers, use the data to encourage on-time payments or to put delinquent accounts on the fast track to foreclosure.

Predictive analytic tools are also used to predict outright fraud. For example, at health insurer Highmark Inc. in Pittsburgh, such systems are set to anticipate and block fraudulent claims.

The adoption of predictive analytics systems is on an upswing, driven by technology advances and the potential for large bottom-line benefits. The number of preconfigured and proven models available for specific industries and applications is increasing, while the model-creation process is more automated than it once was. That means analysts can build models faster—and refresh them more frequently in response to changing business needs.

Successful models can pay off big. At LoanPerformance, a model that predicts which accounts that are 90 days in arrears will default saved one client $2 million in six months. The total cost of deployment was $400,000. Those types of returns are one reason why IDC research shows the sale of predictive analytics tools growing to $3 billion by 2008, which would be a nearly 40% increase from 2004. Such tools make up 25% of the business intelligence market.

As the volumes of business data have increased, the desire to extract value from that information has intensified. Fortunately, predictive analytics tools have become easier to use, says Harmon, allowing more streamlined model-building workflows and enabling analysts steeped in business issues to do more without the involvement of statisticians. "This is where the future lies," he says. "The tools are being automated."

The biggest benefits, however, are coming on two fronts: the inclusion of unstructured data into the predictive modeling process to improve accuracy and a push to execute predictive analytics and present results in real time.

Predictive analytics involve several steps, ranging from identifying and preparing target data to developing a statistical model, testing it on a sample for accuracy and then running it against the full data set. Results are sent to front-office systems, where business logic is used to, for example, cross-sell a customer a different product or flag an insurance claim as potentially fraudulent. While most organizations customize predictive models to their customer bases and business challenges, many processes for finding models have been automated.

More challenging are efforts to achieve real-time results. They fall into three categories: enabling real-time scoring on the front end when, say, a new loan application comes in; updating the back-end databases; and accelerating the pace at which models can be refreshed to deal with changing scenarios, which can be helpful because criminals are constantly devising new ways to commit fraud, for example.

Texting It Up

Harmon says he was surprised at how much text mining increased the accuracy of his predictive models. The previous model included structured information such as loan histories, credit reports and demographics. He added textual notes entered by call center staffers as they spoke with customers.

"That information tends to be very, very rich, despite the fact that it tends to be very noisy," Harmon says. He used tools from Intelligent Results Inc. in Bellevue, Wash., to analyze linguistic data and identify when someone may be lying. For example, if someone says, "The check is in the mail," that might be one indicator. "What we're looking for is not just the words, but the patterns that lead to an event," says Harmon.

"The text-alone models worked better than our standard models," he says. When Harmon mixed the text with structured data, accuracy improved by 18% over his original model.

J.D. Power and Associates is in the early phases of testing text mining. The Westlake Village, Calif.-based customer research firm wants to use verbatim comments from surveys to create an early warning system that predicts warranty problems for automobile manufacturers.

J.D. Power is currently experimenting with a tool from ClearForest Corp. in Waltham, Mass. Preliminary testing has shown that written responses are more useful in predicting the nature of a given problem than are structured, check-box answers, says Joe Ivers, executive director of quality and customer satisfaction research.

While written comments are provided to J.D. Power's customers, the volume of surveys makes it hard for the automakers to identify unforeseen problems with vehicles. The manufacturers want to catch such problems before large volumes of new vehicles have shipped. "By the time something appears frequently enough to appear to the unaided eye, it's too late," Ivers says.

Nextel Communications Inc. in Reston, Va., uses Enterprise Miner from SAS Institute Inc. in Cary, N.C., to make predictions based on text captured in call center dialogues.

Scott Radcliffe, director of decision sciences, says the telecommunications company relates "key phrases that occur during customer interactions" with future customer churn. It has been able to reach out to those customers before they actually leave—a big concern in the highly competitive telecommunications market.

For I4 Commerce Inc., which must approve or deny an online transaction request in under four seconds, real-time analytics is the name of the game. Merchants use the company's "Bill Me Later" service to offer credit to a merchant's customers without the need to present credit card information over the phone or the Internet. Tom Keithly, vice president of credit and integration at I4, says his staff used a predictive analytics workbench from Toronto-based Angoss Software Corp. to develop a model that can score each request to identify fraudulent transactions.

"Our credit decision occurs in real time, and each database we go to is maintained in real time," Keithly says. Inputs include credit reports, demographics, telephone number verification and the vendor's own internal customer histories. As soon as a customer completes a transaction, the system updates that customer's risk score. To do that, Timonium, Md.-based I4 pulls data from its live Oracle database rather than using its data warehouse. "We only use the data warehouse to develop new versions of the [model]," he says.

Although he could use tools like SPSS Inc.'s Clementine scoring engine to download data and deliver the resulting scores, Keithly says that approach would have introduced too much latency for the response time he required. Instead, he took the algorithms built by the modeling system, compiled them in Java and runs them on I4's production servers. "It's just pure math. It operates as logic in the production system," he says.

Real Deal

A critical difference in using predictive analytics is the speed at which models can be refreshed, Keithly says. While the mainframe systems he used years ago only allowed model development every two years, his current tool set allows him to refresh the model every 90 days. But that's still not real time. For most applications, the ability to refresh the model every quarter is adequate, says Keithly. However, he sees areas in which real-time models would be useful, such as fraud, where assumptions must be changed in response to changing perpetrator tactics. Keithly expects to see real-time modeling in the next decade. "It will be worth it as long as it doesn't take a massive investment to make it work," he says.

But a massive investment is often required for organizations to provide real-time access to data. I4 is relatively small and built its IT systems from the ground up in 2001 using state-of-the-art technology, including Solaris servers and Oracle databases. For large companies with older equipment and databases, that's more of a challenge.

"If data is divergent across multiple sources and you need to bring a data warehouse together, that's considerably more money," says Christopher Scheib, manager of decision support at Highmark.

Peter Heijt, vice president of marketing and sales at Fortis Banque SA/NV in Utrecht, Netherlands, wants to provide real-time access to data for predictive analytics applications that will improve the success rate of sales campaigns. "The investment is more or less double the cost of the data structure we have now in data warehouse, data mart and CRM. So the payoff has to be big. We're looking for a 40% increase in sales effectiveness," he says. Heijt is experimenting with a small part of his CRM database to see if the investment is justified.

Scheib says he needs access to outside data in real time to facilitate decisions on how to price policies. "Prescription information we can get in very close to real time, and we can use that to make predictions about health risks," he says. "That's useful for actuaries who are trying to price clients in as near to real time as they can get."

While predictive analytics tools have gotten easier to use, successful enterprise implementations still require collaboration among business analysts, statistics experts and database administrators, say users. "Data preparation can be 60% of the effort," says Lou Agosta, an independent technology analyst in Chicago.

But the biggest challenge may be in learning how to take full advantage of the opportunities that predictive analytics can provide. Developing the right responses is what takes the most time, says Harmon. "Having better predictive models has allowed everyone to re-evaluate their strategies. That's where the intellectual capital is spent," he says.



Sales in 2004: $2.2B

Share of total core analytics market: 25%

Projected growth rate through 2008: 8%*

*Compound annual growth 2003-2008

Source: IDC, January 2004

Copyright © 2005 IDG Communications, Inc.

7 inconvenient truths about the hybrid work trend
Shop Tech Products at Amazon