Monday, August 26, 2019

Unsupervised Machine Learning Model Improves Customer Feedback Analysis



Business Case:

Our client, a multinational technology company, performs regular product demonstrations at their retail stores. Product demonstrations drive sales by showcasing a product’s features and capabilities. After each demonstration, our client collects customer feedback via forms, comments, reviews, ratings and surveys. To improve future demonstrations, new feedback data was entered into a supervised machine learning (ML) model for analysis. Our client trained the supervised ML model using existing data from their teams. Using existing data prevented the model from identifying new themes. As a result, the model required retraining every time new data was entered. To minimize training costs and improve scalability, we needed to create an unsupervised ML model.

Key Challenges:

   Develop a machine learning (ML) model that can independently analyze new data
   Eliminate ML model retraining costs
   Improve scalability

Our Solution:

We developed an unsupervised machine learning (ML) model to improve customer feedback analysis and drive sales.

Figure 1: Unsupervised machine learning model
To create the unsupervised ML model, we researched three topics:

   Theme identification using supervised ML models
   Term frequency-inverse document frequency (TFIDF) methodologies
   Key phrase extraction

By studying supervised ML model results and TFIDF methodologies, we discovered the key insight we used to structure our unsupervised ML model; we could extract key phrases in English using a recurring pattern: (adj/verb)*(noun/proper noun). After discovering the pattern, we worked on several proofs of concept using Python keyword extraction (PKE), rapid automatic keyword extraction (RAKE), and Microsoft Cognitive Services.
   
The implementation of the solution involved extracting key phrases from customer feedback. First, we divided feedback into different sentences using full stops and spaces. Next, we used text preprocessing techniques to extract the core message without contractions, punctuation, stop words, conjugations, or spelling mistakes. Key phrases were identified using nouns and adjectives. In product feedback, comments typically used the product as the noun and the characteristics of the product as adjectives. We then used the Word2Vec model to convert the key phrases to vectors. These vectors were passed through an unsupervised learning engine to cluster the key phrases and decipher feedback themes. Finally, we performed a T-test to identify the significant themes. Our approach enabled us to identify newly emerging feedback themes without model retraining.

We fed new data to our ML model through Azure Databricks. It enabled our client to run robust analytics, drive real-time insights, and perform one-click autoscaling. Autoscaling saves time and ensures enterprise users’ scaling and security requirements are suitably met. Azure Databricks also features optimized connectors, which we used to run Microsoft Cognitive Service application programming interfaces (APIs). These APIs allowed our team to quickly implement entity recognition and key phrase extraction. Because the Azure Databricks solution was managed from a single notebook, our teams could collaborate easily across office locations.

Business Outcomes:

The completed unsupervised ML model enabled our client to immediately scale to new datasets and automatically identify new themes. Azure Databricks further increased scalability and improved deployment speed. With the finished solution, our client can identify themes not contained within training materials and receive significant themes without delay. As a result, our client can more effectively improve future product demonstrations for increased sales potential.

Highlights:

   Developed an unsupervised machine learning (ML) model that improved customer feedback analysis used to increase sales
   Eliminated ML model retraining costs
   Improved scalability and deployment speed with Azure Databricks