There is an interesting quandary around the use of predictive analytics in business: The more predictive models are used in the operation, the harder it is to predict the direction of the business. The reason is that when more processes become probabilistic, the outcomes, even if measurably better, become harder to predict. And when the underlying predictive models automatically adapt, delivering even better results, the challenge gets bigger. This may seem counterintuitive, but the reality is that predictive models are pretty much black boxes. More so if the model describes a complex pattern. And that's when, of course, predictive analytics add the most value. If the pattern were simple to understand, you probably didn't need predictive analytics in the first place.
Reviewing the line-up
Let's look at this transparency versus performance issue using a trivial example. Take a simple loan origination process in a bank. Let's assume the bank's risk rating model is trivial: reject all and only loans to male applicants. This is not the place to start a debate about whether or not this is a sensible lending practice, but what it would give the bank (at least initially, before the policy becomes public knowledge) a pretty predictable outcome. Depending on the usual percentage of female loan applicants the banks will know pretty well how many loans they will reject and approve. But that volume will become harder to predict when the bank gets more sophisticated and looks beyond gender. Suddenly, their rating model might look at non-linear combinations of hundreds of customer attributes. The percentage of applicants accepted will no longer just depend on the gender mix, but many more subtle patterns. The only way to get a gauge on the expected loan influx is to run the (new) risk model over a representative 'through-the-door' population (i.e. everyone who applied for a loan, not just those that were accepted) and count the applications that would have been approved by the model.
Now imagine a bank that has deployed hundreds of non-trivial predictive models, not just to predict various risks, but to calculate the propensity to buy each of their products and services, the likelihood to move to another bank or go dormant, the best way to collect on monies owned, etc. This bank clearly is very evidence-based in its interactions with customers, and there's little doubt they see that reflected in the quality of their business. But how can they get reliable insight on the combined effect of all those probabilistic customer strategies, cecause it appears the downside to all this sophistication is diminished predictability?
The answer of course is simulation. To anticipate business outcomes the bank needs to execute all its strategies against real data in a way that takes into account how these strategies interplay with each other (take for instance the risk management and sales strategies). In fact, if the bank can do that, they might as well occasionally tweak the data (or 'shock' it, in case of stress testing for risk) and see how well their strategies cope with hypothetical circumstances (for instance, a merger, market dynamics, economic or regulatory changes). So let's examine what kind of infrastructure is necessary to pull this off.
Assessing the field
The first key requirement for simulation is that the predictions that are part of a strategy (e.g. to approve or reject a loan based on the probability of default) are not pre-calculated scores but calculated when the strategy/process executes. Otherwise the simulations will reuse the same scores and the predictive analytics component will be entirely static.
Assembling the players
Second, the strategy needs to be holistic (an 'ensemble' strategy). Because one of the reasons to do a simulation is to get a handle on the effect of one strategy on another. The ultimate business outcomes of a federated decisioning system, with multiple rules engines and predictive analytics servers, is not impossible, but very hard to simulate with significant scope for error.
Playing the game
Third, the data collection for the simulations is not a trivial affair (and even harder in a federated architecture). To replay a holistic (CRM, say) strategy, all inputs to every predictive model (e.g. propensity to buy a product) and every rule (e.g. offer product with the highest propensity unless there is a high risk of attrition) within that strategy need to be recorded as well as the rule or model applied to it, the ultimate output (e.g. a product recommendation), and, when available, the response to that output (e.g. a customer accepting the offer). And that's just one decision. Typically, a number of decisions need to be made during every customer interaction. And, obviously, many such interactions take place simultaneously in the various channels. So it adds up. Maybe not to really big data but a large, B2C, multi-channeled enterprise will certainly look at hundreds of millions of decisions stored in the course of a few months.
Now we are ready to simulate a change in the strategy. In the next installment, we will look at some practical examples of business simulation and discuss how we can better better predict the future by using our data from the past.