Definition: Predictive analytics is the branch of data mining concerned with forecasting probabilities. The technique uses variables that can be measured to predict the future behavior of a person or other entity. Multiple predictors are combined into a predictive model. In predictive modeling, data is collected to create a statistical model, which is tweaked as additional data becomes available.
Predictive analytics is a set of mathematical techniques applied to a data set for determining the probability that some scenario is likely to happen or be true. These techniques are applied to many research areas, including meteorology, genetics and marketing — areas in which there’s an abundance of data and a need to forecast the future.
In business, predictive analytics are often used to answer questions about customer behavior. For example, companies often want to know whether or not a particular customer is likely to be interested in a direct-mail offer. Or a business might want to know whether, given a certain set of premiums and benefits, a new customer will become a long-term customer. Ultimately, businesses want predictive analytics to suggest how to best target resources for maximum return.
Cross-selling, upselling, determining customer profitability and promoting customer loyalty are the best-known uses of this technology, according to a report by Forrester Research Inc. analyst Lou Agosta. But there are many other applications, he notes, including credit scoring, predicting machine failures and making the supply chain more efficient.
Plenty of high-level mathematics are involved, but stated simply, predictive analytics is used to ask which characteristics, called predictors, in a data set are clustered together. The technique is also used to determine whether, given a set of predictors, the value for some other characteristic is likely to fall within a desired range.
Though these two questions sound very similar, in practice, they’re quite different. The first one, the search for clustered characteristics, is like saying, “Look through my database of information and find something about my business that I overlooked or might not already know.” You might look through the history of people who have declared bankruptcy to find which characteristics are most tightly linked together: late payments, number of addresses within the past two years, recent divorce or health problems, for example.
The second question, determining whether a particular characteristic falls within a desired range, is like saying, “Given what I know about a customer, find out how likely it is that something else is true.” For example, you might want to analyze the characteristics of a person filing an insurance claim to determine the likelihood that the claim is false. The predictors could be how recently he filed his last claim, the dollar amount of that claim or how long the customer has had the policy.
The two approaches work together. Once linked characteristics have been identified, then the second question can be asked. After an insurance company has found which characteristics are most tightly linked to fraud, for example, it can create an equation that produces a number indicating how likely it is that a particular claim is fraudulent.