Retail forecasting. It's time for a rethink

Traditional Retail Forecasting: That Was Then. This is Now.

Retailers have made do with time series forecasting tools that came of age when technology was expensive and difficult to use, data were scarce and latent, and growth and profitability were not as dependent as now on fast and accurate "granular" forecasts. Under these circumstances, advances in statistical science and tools were slow. Precision and flexibility were compromised to suit technology limitations. That's all changed with:

  • Cheap computing resources and high-performance analytical software after several cycles of Moore's law accelerated by elastic cloud IaaS, PaaS, SaaS, and DaaS provisioning options and open-source software
  • New and varied structured and unstructured data sets, maintainable and analyzable at disaggregated levels, e.g., social media and search behaviors, location-aware interactions, and digitally evident market characteristics
  • Complexities of omni-channel retailing and intensified competitive pressures in all segments
  • Accelerating innovation in predictive and optimisation analytics and the advent of cognitive systems, and machine learning

Time Series Methods Fit for Data Deserts

Retail sales forecasting has primarily relied on time series techniques - detection of patterns in historical sales sometimes adjusted by just a few causal factors - projecting these patterns and causal factor impacts into the future. While diverse in terms of the statistical algorithms employed, time series techniques share two presumptions, born of necessity, that:

  • Most of the information required to reliably predict future sales from historical sales is carried within the time series data itself.
  • Interrogation of a few causal factors can make up for deficits in the information carried by historic sales.

Promotional forecasting is an evolving exception to these constraints. They are designed to handle an increasing number of causal factors, with varying degrees of statistical competence. While improving time series techniques, causal factor analysis itself suffers from two other limitations:

  • The inability to decompose a promotion into the contribution to sales lift made by each constituent attribute
  • Reliance on ad hoc manual adjustments to algorithmic forecasts-often undocumented, unmeasured, and subject to cognitive, emotional, or statistical biases of one type or another.

Degrees of Freedom, Signals, and Noise in Omni-Channel Retail

"Degrees of freedom" measures the number of parameters that constrain or enable movements of the parts of a mechanical system. It's a useful construct for understanding the implications of relying on traditional time series forecasting in a complex environment. Omni-channel retail vastly increases the degrees of freedom enjoyed by its participants - merchants, marketers, planners, supply chain managers, store network real estate planners, ecommerce managers, brands, publishers, and no less important, consumers.

The title of Nate Silver's popular book on forecasting, The Signal and the Noise: Why So Many Predictions Fail - but Some Don't, references a measurement used to evaluate communications system, the signal-to-noise ratio. It measures the ratio of the volume of a signal to the amount of disturbance mixed in with it.

Degrees of freedom and noise-to-signal ratio help us rethink forecasting. In business context, "too much noise" is often used colloquially to mean that there are too many factors in play to reliably decide which set of levers to pull to achieve a business goal, e.g., improve margin to x and maintain sales of y. Two points emerge from this discussion:

  • Complexity does not cause "too much noise." Noise is the effect of signals not yet processed.
  • Effective differentiation depends on understanding the effect of more factors in play than a competitor can understand and putting those effects to advantage

Machine Learning Extends Analytics

Computer scientist Tom Mitchell described machine learning this way:

A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience.

Machine learning surpasses descriptive analytics (about what has happened) and extends predictive analytics (about what is likely to happen), with a third, discovery analytics (the "why" of what happened). Machine learning brings new capabilities to prescriptive analytics (what action should happen).

Machine learning feeds on petabytes and soon zettabytes of data-a total digital universe of 4.4ZB much of it pooling in industry data lakes flowing with tides of proprietary, privately shared, commercially syndicated, and publicly provisioned data.

Machine learning falls into three broad categories-supervised, unsupervised, and reinforcement learning. In all cases it starts with a feedback loop that creates and hones algorithms by running them through sets of training data and improving their performance in each successive run initially against training data and later in production data.

Supervised Machine Learning Approaches to Prediction and Forecasting

Three approaches to supervised machine learning are emerging as promising candidates for predicting and forecasting-ensemble methods, branching algorithms, and factorisation algorithms. The three are well suited to forecasting retail sales determined by interactions among a large number of demand-influencing factors, some not known by merchants, planners, and pricers. Factorisation illustrates this, in particular:

  • Use of inexpensive scalable computing, storing, and networking capacities
  • Consumption of varied, voluminous streaming and static causal factor data
  • Searching for significant causal factors rather than being given a fixed set of them
  • Modeling of complex casual data to convert noise in signals, not find the signal in the noise

Factorisation iterates through the following steps:

  • Identifies attributes (aka factors) influencing the object being forecast
  • Scans these factors across other similar events and learns the contribution of each factor
  • Discovers and scores the contribution of external factors to the observed
  • Iterates successive generation of better models.

Importantly, factorisation machines do not need to be pointed at a set of predefined attributes but discover, rank, and model the impact of factors from among those they can observe.

Factorisation Powers Recommendation Engines

Factorisation techniques have been applied in recent years to make product recommendations, e.g., Netflix movie and show selections. It uses factorisation to create a probabilistic forecast of the number of times a show will be watched-akin to retail sales. Netflix also applies these techniques to the development of new shows-akin to new product development and category management.

A Caveat or Three


  • Investigate and experiment-with cheap compute, store, and network costs and open source tools, it shouldn't risk a lot. But make sure you have computer and data science talents.
  • Look at your forecasting practices and clean them up first. Check out forecasting value added (FVA) analysis, an approach favored by SAS.
  • Don't be put off necessarily by some academic naysayers. Machine learning applied to forecasting is still in its infancy.

The last point begs a historical analogy. The scientific jury was still out when the Wright brothers took flight at Kitty Hawk, N.C. Quickly thereafter engineering disciplines, military applications, and commercial interests turned their primitive application of the Bernoulli principle into modern aviation. Air superiority significantly contributed to the outcome of World War I.

Posted by Greg Girard, Program Director, Merchandise StrategiesIDC Retail Insights

Copyright © 2015 IDG Communications, Inc.

Shop Tech Products at Amazon