December 31, 2021

Accurately Forecast Customer Sales with Machine Learning (ML)

Business Case:

Our client, a multinational food and beverage chain, operates thousands of retail stores and generates billions of dollars of annual revenue. Our client needed to understand the impact of weather, promotions, discounts, product launches, holidays, and other events on sales. The client’s existing predictive sales model routinely underestimated sales volume at both the aggregated and daily level. Our client also needed to better understand the causes of seasonal and daily spikes in sales.

Key Challenges:

  Improve the accuracy of future sales predictions. 
  Identify and analyze patterns in data for nonlinear fitting and predict future sales using historical data. 
  Examine the correlation between weather data (precipitation, temperature, pressure, wind speed, cloudiness, and so on) and sales at a specific longitude and latitude. 
  Analyze the impact of factors such as product launches, promotions, discounts, and holidays on predicted sales. 
  Include seasonality variables to explain seasonal fluctuations in the sales time series. 

Our Solution:

We built a Sales Forecasting Engine on Microsoft Azure Databricks that allowed our client to quickly and accurately predict sales.

Solution Design:

We worked with the client’s marketing operations and finance teams to collect and analyze their sales data, promotion and discount data, and store events data. We also used National Oceanic and Atmospheric Administration (NOAA) historical weather data from the US government to develop the weather model. We extrapolated the historical data and used application programming interfaces (APIs) to connect the data to our machine learning (ML) model to predict weather.


  Used R libraries and custom functions to cleanse and preprocess the data. 
  Used descriptive statistical analysis to tackle skewness and kurtosis for the features. 
  Performed Fourier transforms to decompose sales, analyze trends, and remove noise from the sales time series. 
  Applied logarithmic, exponential, and S-curve transformations to features to introduce nonlinearity as per real scenarios. 
  Developed hybrid regression models to predict future sales using nonlinear, multiplicative, probabilistic, regularized, and deep learning approaches. 
Figure 1: Architecture of Forecasting Engine

Business Outcomes:

Our supervised ML predictive model empowered our client to analyze the impact of weather, promotions, discounts, product launches, holidays, and daily events on sales and execute business decisions accordingly. The model also identified the delay between an event and the seasonal spike, which enabled our client to maximize sales following an event. 

Our hybrid ML model is far more accurate than the previous ML model. The prediction runs on an aggregated and daily basis, and the model retrains itself once actual sales figures are injected into the model.

Our model’s Mean Absolute Percentage Error (MAPE) value was 0.09—as compared to the previous model’s MAPE value of 0.13. (a lower value indicates greater accuracy). 


    Forecasted sales depending on weather variations for the client’s store at a specific longitude and latitude.
    Analyzed the positive and negative impacts of daily events such as discounts, promotions, launch events, and holidays on predicted and actual sales.
    Statistically identified and explained seasonal spikes in sales time series.
    Identified the lag period for daily events to explain the behavior in time series.