Monday, August 26, 2019

Unsupervised Machine Learning Model Improves Customer Feedback Analysis



Business Scenario

Our client showcases a variety of technology products in their retail stores to customers. Product demonstrations highlight product capabilities to drive sales. Following product demonstrations, the voice of customer team collects customer feedback. Customer feedback provides a better understanding of how improved product demonstrations improve sales.

To capture customer feedback, the voice of customer team reviewed forms, written comments, reviews, ratings, and surveys using a supervised machine learning model. The team trained the supervised machine learning model using textual data from other teams they worked with. But the supervised machine learning model approach struggled whenever the voice of customer team wanted to add a new dataset. Because the model was based off the provided training materials, the model could not identify material themes that did not originate from the training materials. As a result, each time the voice of customer team worked with a new dataset, the learning model required retraining. To minimize training costs and improve scalability, we needed to create an unsupervised machine learning model.

How We Did It: The Technical Implementation


Figure 1: Unsupervised machine learning model
To create the unsupervised learning model, we researched three topics:

   Theme identification using supervised models
   Term frequency-inverse document frequency (TFIDF) methodologies
   Key phrase extraction

At the beginning of the project, we conducted research to determine how to implement an unsupervised machine learning approach for theme identification. Prior to the completion of our project, no such research had ever been conducted. We started our research with supervised models, even though our solution needed to eliminate manual theme identification. Studying supervised models enabled our team to better understand how human heuristics guided machine learning. To better understand how to track the statistical significance of themes, we studied TFIDF methodologies. By studying supervised model results and TFIDF methodologies, we eventually discovered the key insight we used to structure our unsupervised machine learning model: We could extract key phrases in English using a recurring pattern: (Adj/verb)*(Noun/Proper noun). After discovering the pattern, we worked on several proofs of concept using the pke and rake libraries and Microsoft’s Cognitive Services.
   
The original implementation of the project involved extracting key phrases from customer reviews and feedback comments. First, we divided feedback into different sentences using full stops and spaces. Then, we used several text preprocessing techniques to extract the core message without contractions, punctuation, stop words, conjugations, or spelling mistakes. We then employed text processing techniques to extract the key phrases. Key phrases were identified using nouns and adjectives. (In product feedback, comments typically used the product as the noun and the characteristics of the product as adjectives.) We then used the Word2Vec model to convert the key phrases to vectors. The resulting vectors were then passed through an unsupervised learning engine to cluster the key phrases in order to decipher feedback themes. Each cluster helped identify feedback themes. Finally, we performed a T-test to identify the significant themes. Our approach enabled us to identify newly emerging feedback themes without requiring model retraining.

We fed data to our machine learning model through Azure Databricks. Azure Databricks offered our client the speed and flexibility they were looking for. Azure Databricks allows users to run robust analytics algorithms and drive real-time business insights. Azure Databricks also offers one-click, autoscaling deployment that ensures enterprise users’ scaling and security requirements are suitably met. Azure Databricks also features optimized connectors, which we used to run Microsoft Cognitive Service APIs. These APIs allowed our team to quickly implement entity recognition and key phrase extraction. Because the Azure Databricks solution was managed from a single notebook, our teams could collaborate easily across office locations.

Business Outcomes:

The completed unsupervised machine learning model enabled the voice of customer team to scale the solution to new datasets immediately. Azure Databricks further enhanced scalability and improved deployment speed. With the finished solution, our client can identify themes not contained within training materials and receive significant themes without delay. The completed solution also reduced the manual effort required to identify and map themes.