Monday, January 13, 2020

Machine Learning Forecasts Customer Sales

Business Case:

Our client runs a worldwide chain of retail stores and generates billions of dollars of revenue annually.

Our client needed to understand the impact of weather, promotions, discounts, product launches, holidays, and other events on sales. The client’s existing predictive sales model routinely underestimated sales volume at both the aggregated and daily level. Our client also needed to better understand the causes of seasonal and daily spikes in sales. For example, on December 25, 2018, actual sales in the US were 25% greater than predicted sales.

Existing Process:

Prior to working with MAQ Software, our client’s marketing and finance team used a statistical model that leveraged past sales and promotion data to forecast sales. The statistical model’s results were unreliable and inaccurate. With the previous system, sharing timely and accurate information with leadership team was difficult.

Key Business Challenges:

   Accuracy - Build a machine learning model to improve the accuracy of future sales predictions. 
   Reliability – Build a reliable forecast model to facilitate marketing, supply chain, and inventory decisions. 
   Simplicity – Build an automated framework that predicts future sales and retrains itself once actual sales are injected into the model.

Key Technical Challenges:

   Build a machine learning model that regularizes variables using the smoothness of predictor functions. 
   Build a machine learning model that analyzes and uncovers patterns in data for nonlinear fitting and predicts future sales using historical data points as inputs. 
   Examine the correlation between weather data (precipitation, temperature, pressure, wind speed, cloudiness, and so on) and sales at a specific longitude and latitude.
   Analyze the impact of factors such as product launches, promotions, discounts, and holidays on predicted sales.
   Include seasonality variables to explain seasonal fluctuations in the sales time series.


We worked with the client’s marketing operations and finance team to collect and analyze their sales data, promotion and discount data, and store events data. We also used NOAA historical weather data from the US government to develop the historical weather model. We extrapolated the historical data and used APIs to connect the data to our machine learning model to predict future weather.

Our team created a hybrid ML model that predicts future sales far more accurately than the previous ML model. The prediction ran on an aggregated and daily basis, and the model retrains itself once actual sales figures are injected into the model.

Our model’s MAPE (Mean Absolute Percentage Error) value was 0.09—as compared to the previous model’s MAPE value of 0.13. (A lower value indicates greater accuracy). 

Key Highlights:

   Used R libraries and custom functions to cleanse and preprocess the data. 
   Used descriptive statistical analysis to tackle skewness and kurtosis for the features. 
   Incorporated weather data to train the model and analyze the impact of weather on sales..
   Performed Fourier transforms to decompose sales, analyze trends, and remove noise from the sales time series.
   Forecasted sales depending on weather variations for the client’s store at a specific longitude and latitude.
   Applied logarithmic, exponential, and S-curve transformations to features to introduce nonlinearity as per real life scenarios.
   Developed hybrid regression models to predict upcoming future sales using nonlinear, multiplicative, probabilistic, regularized, and deep learning approaches.

Our Sales Forecasting Engine, built on Microsoft Azure Databricks, allowed our client to align their business objectives with predicted sales. Figure 1 shows the architecture of our Forecasting Engine.
Figure 1: Architecture of Forecasting Engine

Business Outcomes:

Our supervised machine learning predictive model empowered our client to analyze the impact of weather, promotions, discounts, product launches, holidays, and daily events on sales and execute business decisions accordingly. The model also identified the delay between an event and the seasonal spike, which empowered our client to maximize sales following an event. 

Outcome Highlights:

   Forecasted sales depending on weather variations for the client’s store at a specific longitude and latitude 
   Analyzed the positive and negative impacts of daily events such as discounts, promotions, launch events, and holidays on predicted and actual sales.  
   Identified and explained seasonal spikes in sales time series statistically.
   Identified the lag period for daily events to explain the behavior in time series.