Monday, January 13, 2020

Machine Learning Forecasts Customer Sales



Business Case:

Our client runs a worldwide chain of retail stores and generates billions of dollars of revenue annually.

Our client needed to understand the impact of weather, promotions, discounts, product launches, holidays, and other events on sales. The client’s existing predictive sales model routinely underestimated sales volume at both the aggregated and daily level. Our client also needed to better understand the causes of seasonal and daily spikes in sales. For example, on December 25, 2018, actual sales in the US were 25% greater than predicted sales.

Existing Process:

Prior to working with MAQ Software, our client’s marketing and finance team used a statistical model that leveraged past sales and promotion data to forecast sales. The statistical model’s results were unreliable and inaccurate. With the previous system, sharing timely and accurate information with leadership team was difficult.

Key Business Challenges:

   Accuracy - Build a machine learning model to improve the accuracy of future sales predictions. 
   Reliability – Build a reliable forecast model to facilitate marketing, supply chain, and inventory decisions. 
   Simplicity – Build an automated framework that predicts future sales and retrains itself once actual sales are injected into the model.

Key Technical Challenges:

   Build a machine learning model that regularizes variables using the smoothness of predictor functions. 
   Build a machine learning model that analyzes and uncovers patterns in data for nonlinear fitting and predicts future sales using historical data points as inputs. 
   Examine the correlation between weather data (precipitation, temperature, pressure, wind speed, cloudiness, and so on) and sales at a specific longitude and latitude.
   Analyze the impact of factors such as product launches, promotions, discounts, and holidays on predicted sales.
   Include seasonality variables to explain seasonal fluctuations in the sales time series.

Solution

We worked with the client’s marketing operations and finance team to collect and analyze their sales data, promotion and discount data, and store events data. We also used NOAA historical weather data from the US government to develop the historical weather model. We extrapolated the historical data and used APIs to connect the data to our machine learning model to predict future weather.

Our team created a hybrid ML model that predicts future sales far more accurately than the previous ML model. The prediction ran on an aggregated and daily basis, and the model retrains itself once actual sales figures are injected into the model.

Our model’s MAPE (Mean Absolute Percentage Error) value was 0.09—as compared to the previous model’s MAPE value of 0.13. (A lower value indicates greater accuracy). 

Key Highlights:

   Used R libraries and custom functions to cleanse and preprocess the data. 
   Used descriptive statistical analysis to tackle skewness and kurtosis for the features. 
   Incorporated weather data to train the model and analyze the impact of weather on sales..
   Performed Fourier transforms to decompose sales, analyze trends, and remove noise from the sales time series.
   Forecasted sales depending on weather variations for the client’s store at a specific longitude and latitude.
   Applied logarithmic, exponential, and S-curve transformations to features to introduce nonlinearity as per real life scenarios.
   Developed hybrid regression models to predict upcoming future sales using nonlinear, multiplicative, probabilistic, regularized, and deep learning approaches.

Our Sales Forecasting Engine, built on Microsoft Azure Databricks, allowed our client to align their business objectives with predicted sales. Figure 1 shows the architecture of our Forecasting Engine.
Figure 1: Architecture of Forecasting Engine

Business Outcomes:

Our supervised machine learning predictive model empowered our client to analyze the impact of weather, promotions, discounts, product launches, holidays, and daily events on sales and execute business decisions accordingly. The model also identified the delay between an event and the seasonal spike, which empowered our client to maximize sales following an event. 

Outcome Highlights:

   Forecasted sales depending on weather variations for the client’s store at a specific longitude and latitude 
   Analyzed the positive and negative impacts of daily events such as discounts, promotions, launch events, and holidays on predicted and actual sales.  
   Identified and explained seasonal spikes in sales time series statistically.
   Identified the lag period for daily events to explain the behavior in time series.

Thursday, January 2, 2020

Supervised Machine Learning Model Forecasts Impact of Co-Sell Deals on Sales revenue using Azure Databricks


Supervised Machine Learning Model forecast impact of Strategic deal on Sales:
Business executives require visibility into the impact of decisions at all levels of their companies. Executives appreciate having this visibility at each stage of a project,  from initiation to  closing .  Reports showcasing  the results of business decisions across the entire organization help with such visibility. This case study reviews one level of a management decision chain, where revenues are based on sales managers’ offers to customers.

Business Scenario:
Our client, a team of sales managers at a multinational software organization, sells software product solutions to retail customers via dealers (co-sellers).  Our client needed to quantify the value of successful deals on the revenue driven by end retail customers. Also, our client wanted to know how much  their collaborative business model with dealers drove revenue. With this information, the client could focus on the factors with a higher impact on sales.

To provide our client with insights on co-sell deals, we created a Power BI dashboard powered by a machine learning model.

How We Did It?

Architecture:

Data Gathering: We realized that we needed additional data points to generate a successful and accurate model. We integrated different data sources to ensure access to important key features, such as customer geography and the status of deals between clients and dealers.

Data Consolidation: We performed a deep dive into the customer data set to find basic insights and possible key features that might impact revenue. We collected data from various systems, such as CRM and sales systems, which we used in subsequent steps for data engineering.

Feature Engineering: After gathering the required inputs, we feature engineered our input dataset and used Databricks to create a forecasting model using linear regression. We used existing and custom regression techniques to improve the forecasting model’s accuracy.

OptimizationWe cut down the model run time from 3 hours  to 15 minutes by parallelizing  the forecasting algorithm for each customer. This optimization provided the customer with near real-time analytics information. 

Key Challenges:
The major challenge we faced was forecasting revenue using existing data sources.

Another key challenge involved the model’s run time. The model was based on existing forecasting libraries of ‘R’ environment, which were not as agile as we wanted. We modified the ‘R’ libraries to ‘Spark R’,  enabling the model to run and provide forecasts quickly. We also had to improve scalability , considering future scenarios when our model would run not just on thousands, but millions of records. We achieved scalability using Azure Databricks, thereby providing a complete cloud-based, near real-time analytics solution.

Business Outcomes:
The end result was a Power BI report which helped our client understand the lifetime value and growth trend of co-sell wins. The report also provided the functionality to slice and dice the dataset with built-in filters for further insights.

Our dashboard allowed our customer to visualize impact by fiscal year, customer details (such as area, region, subregion, subsidiary, segment, and subsegment), and performance.

Our supervised machine learning model resulted in five benefits for our client:
1.    Optimal cost through end to end model on cloud with smart infrastructure
a.    Automated SKU scalability with accelerated ML model run.
2.    Quicker insights from accelerated model run
a.    Model run time reduced from 3 hours to 15 min with parallel data processing.
b.    Option for scalability of input data, leading to future support for growing data volumes.
3.    Near real-time impact analysis for input data
a.    We were able to achieve near real time modelling and forecasting.
4.    Ability to develop marketing strategy based on:
a.    Geography of end customer.
b.    Strategic relevance of partner for business.
c.     Lifetime and time-based impact of deal on sales revenue.
5.    Enhanced security through role-based access control, so that current and forecasted revenue data is safe from unauthorized access.

Wednesday, November 13, 2019

Engineering Insights: WPF Application Optimizes and Eases Delivery



Business Scenario

As a part of support services, our client provides customers with troubleshooting services and the ability to proactively check the health of software installed on their system.

To use the platform, users traditionally had to perform the following steps:

1. Download and install software drivers on one or more data collection machines.
2. Run PowerShell commands to set up the software.
3. Run PowerShell commands with over ten parameters to configure the health check.
4. Run PowerShell commands to start the health check.
5. Run PowerShell commands to stop the health check.

Even after completing the steps, users did not know whether the health check successfully completed. The actual execution of the health check required several hours.

Key Challenges:

Our client faced three key challenges:

1. Downloading and installing the drivers and scripts and configuring and running the health check required significant time. As a result, many users abandoned the health check and did not complete the process.
2. The client's engineering team often needed to support their customers through the setup, configuration, and execution steps. As a result, the engineering team could not always focus on building new features and improving the platform. 
3. Users frequently created support tickets after following incorrect steps for the setup. The high number of support tickets led to significant support costs. 

Solution

Our client wanted to help customers by providing an automated method to configure the health check software. They needed an application to guide customers through the prerequisites, setup, configuration, and execution steps.

To address our client’s challenges, we needed to understand our client’s customers’ obstacles. We shadowed the engineering team and identified five common challenges.

1. Customer machines often did not possess the prerequisites necessary for running certain health checks. 
2. Customer machines sometimes contained policies that interfered with health check execution.
3. Customers often did not provide the parameters required for health checks for specific products.
4. Customers often entered credentials in PowerShell commands incorrectly.
5. Customers struggled to know whether they successfully completed the health check.

Armed with this knowledge, we developed an easy to use Windows Presentation Foundation (WPF) application that simplified customer experience considerably. This decreased the amount of effort required of customers. The WPF application resulted in a seamless end-user experience. 

Key Highlights:

1. Our application provides a one-click installation solution, removing the need for multiple PowerShell steps.
2. Our application provides clearly named menu options and buttons for users to evaluate and install all prerequisites and necessary drivers. Our application also allows users to enable and disable relevant policies and other settings. 
3. Our application allows users to specify all parameters, providing a customized install experience.
4. To simplify the health check, we separated the setup process into stages and visualized the stages with various colors to signify success, failure, or progress.
5. Our application validates all user inputs, identifies mistakes and invalid values, and assists users with troubleshooting using tooltips.
6. After users complete the health check, our application performs several tests and provides a confirmation of successful setup. If the setup is unsuccessful, the application provides a list of issues encountered, along with documentation for correction.
7. Our application also provides users with the ability to set up multiple health checks.


Business Outcomes:

The application we developed is available for download on our client’s website. Our client received many accolades for the application from their engineering teams, support teams, and customer-facing community. The application resulted in increased goodwill and efficiency for both our client and their customers.

1. Over ten percent of customers downloaded and used the application within a week of launch.
2. Application downloads and usage continue to increase.
3. Platform usage increased, and the completion percentage of health checks improved.
4. Reduced the number of support requests. 

Thursday, October 3, 2019

Data Dictionary for Dynamics 365



Business Scenario

Dynamics 365 is a cloud-based business applications platform. It combines enterprise resource planning (ERP) and customer relationship management (CRM) components with productivity applications and artificial intelligence (AI) features. Out of the box, Dynamics 365 offers diverse tools, but complex business environments often require custom components. These components, called solutions, enable businesses to customize and expand Dynamics 365 features to meet specific business needs.

Each solution is deployed as a single zip file. The zip file contains all the objects (entities, workflows, processes, roles, etc.) present in the environment. The file includes both pre-existing objects and objects that are newly added as part of a current development cycle.

We recently launched a Dynamics 365 implementation for a large software company. The project contains more than 200 entities and several processes and workflows associated with each entity. In Dynamics 365, entities are used to model and manage business data for Customer Engagement apps.

The software company’s Dynamics 365 implementation is still evolving. Sprint-over-sprint we are adding new features and incorporating change requests. These changes include adding new entities or updating or deleting existing entities and their attributes (e.g., field name changes, character length changes, or adding new workflows).

When solutions in large Dynamics 365 projects (such as the project detailed above) are updated, team members must track the changes. Tracking changes allows:

1. The internal team to review and monitor changes before deploying to production.
2. Administrators of downstream systems to anticipate changes that may affect their ETL (Extract-Truncate-Load) jobs.
3. Devs to collaborate on solution updating efforts.

Dynamics 365 does not maintain a subversion of a solution, which would help identify what changed in an updated release. Due to the lack of subversions, it becomes difficult to highlight release over release changes and share them with the larger audience.

We needed to develop a way to keep track of solution changes in a Dynamics 365 environment across multiple releases.

How We Did It: The Technical Implementation

We created a utility called Data Dictionary that accepts two Dynamics 365 solution files as inputs. Data Dictionary compares two versions of a solution (previous and current version). The utility processes the solution files and generates an output in the form of three Excel files. Figure 1 shows Data Dictionary’s comparison flow.

Figure 1: Utility comparison flow diagram

Excel Output Files:

1. Excel file (previous version) - Provides schema of the previously deployed solution.
2. Excel file (current version) - Provides schema of the current solution. This includes the objects that were there in the previous version and those introduced in the current version.
3. Excel file (delta) – This file is similar to the Excel file (current version) but with all the changes highlighted to clearly identify what has been added, modified, or removed from the system in the current version.

Key Highlights:

1. Helps the internal team review (at a glance) what has changed in a solution before deploying the solution to production.
2. Communicates changes between releases to downstream systems. Knowing the changes allows downstream users to update their ETL (Extract-Truncate-Load) jobs, ensuring that they report on the latest data points.
3. Improves dev team collaboration by allowing them to quickly spot differences between two solutions versions.


Business Outcomes:

Data Dictionary saved significant manual effort in identifying and communicating changes in solutions releases. We run Data Dictionary after each major Dynamics 365 deployment and share the output files with key stakeholders.

Monday, September 23, 2019

Engineering Insights: Azure Search-based Web Solution



Business Scenario

A multinational technology corporation offers numerous online learning courses for its premium customers. The customers’ employees rely on the courses to improve their skills, helping them remain competitive in today’s business environment.

The corporation’s previous course management system was cumbersome. The system did not allow employees to quickly search among the thousands of learning courses. Users could only find courses if they knew exactly what they were looking for. Employees needed to be able to browse and filter the catalog and select an offering for their needs. They also expected fast answers. Every millisecond matters!

The corporation needed a scalable solution for future growth.

What customers needed with better search:

1. Better browsing capability – Our client’s employees had difficulty narrowing down courses that fit their expertise level.
2. Enhanced search – Our client’s employees could not search within the course’s content, such as the description or video subtitles.
3. Improved relevance – Our client’s search results did not show the results most relevant to the search term.
4. Rapid free-form text search – Our client needed the ability to support fuzzy search based on terms that have similar construction.

How We Did It: The Technical Implementation

To generate improved search results, we built an entire architecture powered by Azure Search. The corporation’s course details are stored in Azure Cosmos database. Using Change Data Capture, we achieved incremental indexing.

With this architecture, we implemented native search capabilities, faceted filtering, sorting, auto-complete, and pagination in a robust and scalable manner.

By leveraging scoring profiles, we developed a custom solution to improve search ranking and relevance.

Key Highlights:

1. Reduced page load time by 90%.
2. Improved user experience with auto-complete, hit highlighting, sorting, and paging.
3. Created scalable architecture.
4. Incorporated linguistic analysis.
5. Implemented server-side encryption at rest.


Business Outcomes:

With the new Azure Search-powered architecture, courses are no longer undiscoverable. Users are very satisfied with the improved speed and accuracy of results. Users can refine results by technology, video duration, and difficulty level using the rich filtering system. The filtering experience has improved customer confidence, encouraging more users to actively complete the courses--ultimately boosting our clients’ customers’ technical competence.

Monday, August 26, 2019

Unsupervised Machine Learning Model Improves Customer Feedback Analysis



Business Scenario

Our client showcases a variety of technology products in their retail stores to customers. Product demonstrations highlight product capabilities to drive sales. Following product demonstrations, the voice of customer team collects customer feedback. Customer feedback provides a better understanding of how improved product demonstrations improve sales.

To capture customer feedback, the voice of customer team reviewed forms, written comments, reviews, ratings, and surveys using a supervised machine learning model. The team trained the supervised machine learning model using textual data from other teams they worked with. But the supervised machine learning model approach struggled whenever the voice of customer team wanted to add a new dataset. Because the model was based off the provided training materials, the model could not identify material themes that did not originate from the training materials. As a result, each time the voice of customer team worked with a new dataset, the learning model required retraining. To minimize training costs and improve scalability, we needed to create an unsupervised machine learning model.

How We Did It: The Technical Implementation


Figure 1: Unsupervised machine learning model
To create the unsupervised learning model, we researched three topics:

   Theme identification using supervised models
   Term frequency-inverse document frequency (TFIDF) methodologies
   Key phrase extraction

At the beginning of the project, we conducted research to determine how to implement an unsupervised machine learning approach for theme identification. Prior to the completion of our project, no such research had ever been conducted. We started our research with supervised models, even though our solution needed to eliminate manual theme identification. Studying supervised models enabled our team to better understand how human heuristics guided machine learning. To better understand how to track the statistical significance of themes, we studied TFIDF methodologies. By studying supervised model results and TFIDF methodologies, we eventually discovered the key insight we used to structure our unsupervised machine learning model: We could extract key phrases in English using a recurring pattern: (Adj/verb)*(Noun/Proper noun). After discovering the pattern, we worked on several proofs of concept using the pke and rake libraries and Microsoft’s Cognitive Services.
   
The original implementation of the project involved extracting key phrases from customer reviews and feedback comments. First, we divided feedback into different sentences using full stops and spaces. Then, we used several text preprocessing techniques to extract the core message without contractions, punctuation, stop words, conjugations, or spelling mistakes. We then employed text processing techniques to extract the key phrases. Key phrases were identified using nouns and adjectives. (In product feedback, comments typically used the product as the noun and the characteristics of the product as adjectives.) We then used the Word2Vec model to convert the key phrases to vectors. The resulting vectors were then passed through an unsupervised learning engine to cluster the key phrases in order to decipher feedback themes. Each cluster helped identify feedback themes. Finally, we performed a T-test to identify the significant themes. Our approach enabled us to identify newly emerging feedback themes without requiring model retraining.

We fed data to our machine learning model through Azure Databricks. Azure Databricks offered our client the speed and flexibility they were looking for. Azure Databricks allows users to run robust analytics algorithms and drive real-time business insights. Azure Databricks also offers one-click, autoscaling deployment that ensures enterprise users’ scaling and security requirements are suitably met. Azure Databricks also features optimized connectors, which we used to run Microsoft Cognitive Service APIs. These APIs allowed our team to quickly implement entity recognition and key phrase extraction. Because the Azure Databricks solution was managed from a single notebook, our teams could collaborate easily across office locations.

Business Outcomes:

The completed unsupervised machine learning model enabled the voice of customer team to scale the solution to new datasets immediately. Azure Databricks further enhanced scalability and improved deployment speed. With the finished solution, our client can identify themes not contained within training materials and receive significant themes without delay. The completed solution also reduced the manual effort required to identify and map themes.

Thursday, August 22, 2019

Infographic: 14 Best Practices for Power BI