The Forecast Is Clear

The potential for mining cost-saving and revenue-boosting ideas from data is increasing as companies build bigger data warehouses, applications become more integrated, computers grow more powerful and vendors of analytic software introduce products that are easier to use.

But many companies that have made huge investments in terabyte-size data stores aren't using them effectively to forecast the future -- to predict, for example, which customers are likely to leave, which ones will probably respond to the next promotion, which ones are ripe for cross-selling and what will happen to sales if prices are increased by 5%.

While many of the products that can answer those questions use esoteric techniques such as neural networks, logistic regression and support-vector machines, they don't require a Ph.D. in math, users say. Indeed, the biggest stumbling block to using "predictive analytics" is getting the data, not analyzing it, they say.

That has been the case so far at BankFinancial Corp. in Chicago. It uses the Clementine data mining "workbench" from SPSS Inc., also in Chicago, to develop models that predict customer behavior so the bank can, for example, more accurately target promotions to customers and prospects.

The bank uses Clementine's neural network and regression routines for these models. It's also beginning to use PredictiveMarketing, SPSS's new package of "best-practice templates" for helping users set up predictive models.

Models Easy, Data Hard

PredictiveMarketing will reduce the time it takes the bank to develop a model by 50% to 75%, says William Connerty, assistant vice president of market research. The first major application is a model to predict customer "churn," the rate at which customers come and go. It will be used to identify the customers most likely to leave the bank during the coming month.

The problem is, the model has access only to account information prepared from weekly and monthly summaries, not to the daily customer activity that would make it more timely. "The biggest obstacle is getting transaction data and dealing with disparate data sources," Connerty says.

The data that BankFinancial needs in order to assess customer loyalty comes from several bank systems and unintegrated customer survey databases. A lot of systems integration and interface work needs to be done before the bank will see the full fruits of its modeling tools, Connerty says.

"We need to increase our efficiency, our ability to deliver actionable information to decision-makers," he says. "I'm under a lot of pressure to deliver."

KXEN Inc. (Knowledge Extraction Engines), an analytic software company in San Francisco, is another vendor that has heard users' cries for easier modeling. It claims that its Analytic Framework product can greatly reduce the time it takes to define, develop and run a model. For example, KXEN's Consistent Coder module automatically transforms raw, inconsistent data into clean, uniformly formatted data that's ready for modeling.

"The big sweet spot for KXEN is it cuts data preparation time in half," says KXEN user Seymour Douglas, director of CRM and database marketing at Cox Communications Inc. in Atlanta. It also masks complexity, he says, "so you don't need a big-dollar statistician; you can put someone at a more junior level, because a lot of heavy lifting is done by the tool."

Cox, a cable services provider, uses KXEN's Analytic Framework to identify its most loyal and profitable customers, predict churn and forecast who might be most receptive to cross-selling pitches.

One model revealed that customers in apartments tend to be relatively short-term Cox customers. "So we now offer them product packages where we try to recover our investment quicker," Douglas says. "Without KXEN, that would not have been obvious at all."

But the labor-saving benefits of KXEN come at a stiff price, he says. "For a five-seat license, you'll pay about $360,000, plus an annual fee of about $60,000."

Get Organized

Robert Berry, president and CEO of Central Michigan University Research Corp. in Mount Pleasant, says many companies have made huge investments in data warehouses but tend to use them more for analysis of past performance than for "predictive intelligence." One reason is they aren't organized for it, he says.

Berry says predictive modeling should involve collaboration among people who have IT, analytical and business expertise.

"You have to build a business-intelligence team," he says. "But companies are struggling with common issues like who owns it, who manages it and so on. How do you pull the business skills, the IT skills and the analytical skills across corporate silos and create this team? It's not easy."

Berry advises having the business-intelligence team report directly to a business unit. "It needs to have a definite link to corporate profits," he says.

Giving Ratings to Leads

Hewlett-Packard Co.'s Enterprise Systems Group pulls together people with diverse backgrounds and strong analytical skills -- including some people who also have IT skills -- for its group that does predictive modeling of customer behavior. The group is part of "CRM operations" under a vice president for sales, says Randy Collica, a senior business/data mining analyst. On a project-by-project basis, people from sales, marketing and other departments may participate, he says.

Collica says it's not necessary to have a professional mathematician on staff in order to do statistical modeling. "But you need some basic statistics," he says. "If someone says, 'This is a normal distribution,' you at least need to know what that means."

HP uses software from SAS Institute Inc. in Cary, N.C., to mine its database of customers and prospects, using regression and other techniques to predict churn, loyalty and where to target promotions. HP also mines its huge stores of unformatted text information, conducting a kind of predictive analytics that's much less common.

HP has some 750GB of customer information, including data from premerger Compaq Computer Corp. that dates to 1984. It has customer data from its call centers, including e-mails from customers and prospects and text typed in during voice calls. Included in these call records are "lead ratings," call center personnel's assessments of a caller's readiness to buy -- coded as "hot," "warm" or "suspect."

But some records lack lead ratings, so HP has used SAS's Text Miner to predict the rating these customers should get. Text Miner does that by comparing text from unrated customers to "clusters" of text from rated customers that contain similar terms and concepts.

Text Miner works by preprocessing raw text after transforming it into a grid, or matrix, that relates terms to documents. The matrix indicates the frequency of every term in the document collection. Specific bits of important information, such as customer names, are extracted and summarized.

Next, a mathematical technique called singular-value decomposition replaces the original matrix with a much smaller matrix by purging unimportant words and highlighting more relevant ones.

The new matrix can be used to place associated terms and documents into categories. HP helps standardize the matrix with synonym lists that say, for example, that customers calling about "disk drives" or "hard disks" are really all interested in storage.

Finally, clustering, classification and predictive methods are applied to the reduced data using traditional data mining techniques. HP uses "memory-based reasoning," a technique that makes a prediction about a record by comparing it with past records with similar characteristics.

These techniques can predict the customer-lead rating with 85% accuracy, Collica says. "Without this technique, I'd have had to go back to the original records and actually read them," he explains. "And when you have that much volume, you can't read them all."

HP also intends to use the text mining and clustering techniques to find out what loyal customers tend to talk about when they contact an HP call center, as well as what's on the minds of those customers deemed least loyal. The goal, of course, is to win over the less loyal ones.

Collica says HP has yet to exploit a number of promising text data sources. For example, it will analyze the text in warranty claims to glean insights about problems customers are having with products and the text in warranty cards to better understand its customers.

HP will also try to mine information from customers' and prospects' own Web sites. "Web sites are a great source of wonderful information about your customers," Collica says.

Special Report

data_mgt_wide_kc_teaser.gif

Mining for Gems

Stories in this report:

Copyright © 2003 IDG Communications, Inc.

7 inconvenient truths about the hybrid work trend
Shop Tech Products at Amazon