Tuesday, April 16, 2019

Case Study: 2.9 Million Arizona Citizens Receive Benefits Efficiently Using AI-powered Chatbot



Key Challenges

   Improve Program Service Evaluator training by providing responses to common questions.
   Enable evaluators to obtain policy information without searching the entire policy manual.
   Respond to policy manual questions in everyday language.

Complex Policies Prompt a Need for More Efficient Training

Our client, the Arizona Department of Economic Security (DES), needed to improve its Program Service Evaluator (PSE) training. PSEs are responsible for administering benefits and guiding applicants through the application process. As a part of this process, PSEs refer to an online policy manual that outlines guidelines and protocols for various state programs. After years of legislative and policy amendments, the manual is dense with legal terminology and technical language. PSEs are tasked with searching the manual using keywords, then translating the results into everyday language to communicate with their clients.

Although the PSEs can search the policy manual using keywords, the search terms often do not match the complex language of the manual. PSEs regularly must reach out to coworkers or senior employees for relevant information, which results in evaluation delays. Although less experienced team members will always need guidance, the senior members of the policy team realized that they could save a lot of time if PSEs had direct access to responses to common questions. From this realization, the mandate to create a chatbot intelligent enough to decipher the complex policy manual was born.

A chatbot offered several advantages over the PSEs’ previous means of gathering information. A chatbot would reduce time commitments for senior employees by responding to common PSE questions using preformulated replies. A chatbot would also allow PSEs to obtain information from the DES policy manual without explicitly using the search function. PSEs would be able to ask the chatbot questions in everyday language, and the chatbot would return information validated from the manual. Lastly, a chatbot would enable PSEs to continue referencing the policy manual as they always had, with the benefit of a supplementary resource. A chatbot implementation that successfully “understood” the contents of the policy manual would dramatically reduce the amount of time spent poring over the policy manual. Ultimately, the chatbot would enable the PSEs to more efficiently evaluate benefits applications.

Incremental Improvements and the Iterative Process

We divided the chatbot development into four user acceptance testing (UAT) stages: Preview, MVP, MVP+, and Pilot. (See Figure 1). We pushed out the first preview build of the chatbot within three weeks of starting the project. The initial build allowed us to get early feedback, enabling course-corrections.

Figure 1: Project UAT Stages

The first build of the chatbot responded to PSE questions by referencing a manually compiled knowledge base of stored questions and replies. From early user testing, however, we knew we needed to refine the approach.

One challenge with the first build was that it didn’t adequately address the size and complexity of the policy manual. Although the initial build’s knowledge base covered 500 of the most common PSE questions and responses, it simply did not contain enough information to address the intricacies of the manual. During this first stage, PSEs frequently asked questions that both our client’s policy team and our chatbot team thought the chatbot would be able to answer. Instead, the chatbot often returned answers unrelated to the PSEs’ queries.

Additionally, the first build often required the PSEs to phrase their questions in a manner that ran counter to how they searched the policy manual. The old policy manual search engine prioritized the frequency of typed keywords. For example, if a PSE searched for “earned income,” the search engine would return the result with the highest number of occurrences of the phrase. The result was that experienced PSEs came to expect results returned in a certain order. Our chatbot needed to be able to return a field of results when presented with a single keyword and a specific result when presented with multiple keywords, all while ordering the results in a manner the PSEs expected.

With these challenges in mind, it became clear that the chatbot would need to move past a manually compiled question and response bank, and even reach beyond typical chatbot capabilities. Our chatbot needed to mimic the behavior of modern search engines yet provide conversational responses to questions phrased in natural language. The chatbot also needed to understand questions relating to all parts of the manual, which would ultimately result in the chatbot’s knowledge base expanding from roughly 500 question and response pairs to well over 5,000.

Producing Refined Results

To achieve such a significant change in behavior, we developed a method to generate questions and responses automatically when the bot crawled policy content. If DES added new content to the policy manual, the bot would automatically crawl the new pages and update its database. This ability to auto-update the question and response database was unique to this project and crucial to meeting DES’s workflow needs. This ensured that the chatbot could always access all the content from the manual. The new build even allowed users to narrow the search categories to further refine results.

By automatically generating question and response pairs, the chatbot team was better able to incorporate the policy team’s knowledge to improve the bot through user feedback loop training. If users struggled to find the information they needed, we could now directly influence the chatbot’s machine learning process by connecting a user’s question with the exact page they were looking for. This offered a substantial advantage over the previous build, in which the inclusion of question and response pairs was unstructured. Additionally, providing a structured process for question and response pairs significantly improved the speed at which the bot learned.

We conducted weekly UAT meetings, progressively increasing the audience size (Figure 1). In these meetings, specific chatbot queries were acknowledged and then used to identify mismatched keywords. This was crucial to improving the chatbot and gaining acceptance and adoption within the PSE community. As PSEs and supervisors saw their concerns addressed, they felt ownership over the outcome and became champions for the chatbot. Through the testing process, the chatbot learned quickly, eventually returning results with 90+ percent accuracy.

Putting the Users First

User friendliness was of the utmost importance when creating the chatbot. The chatbot is used by over 1,800 PSEs with varying degrees of technical expertise. The PSEs need to access the chatbot both through a web interface and through Skype for Business. Also, administrators must be able to view the question and answer database at a glance, manually edit the questions and answers if needed, and manually trigger the crawl function if the database needs updating outside of the regularly scheduled crawls.

We designed a web interface that is welcoming, resembling a smartphone text message window, with a friendly avatar. This brings comfort to non-tech savvy users and utilizes an already familiar user experience. The window further provides options to track case numbers, resize, and export conversations. When users type an ambiguous question, the chatbot offers multiple possible responses (with references). This helps users clarify the results without having to ask a series of multiple follow-up questions.

In addition to the web interface, the PSEs needed to access the chatbot via Skype for Business. Skype interactivity posed significant challenges, as the interface had to be entirely text-based. Our engineers, however, rose to the challenge, creating intuitive menu options that users select via number input. Despite the limitations, the team successfully implemented a Skype interface with all the functionality of the web interface.

Finally, we created an admin portal that is simply designed, yet powerful enough to customize chatbot responses, manually trigger policy database crawls, track case numbers, and view response metrics.

The effort the team put into designing the chatbot interface and admin portal resulted in a chatbot solution where PSEs who have never encountered the implementation can interact with it, understand how it works, and use it proficiently within minutes. As the DES project director observed, the chatbot integrated seamlessly into the PSEs’ workflow.

Going Live: Distributing Benefits with AI-driven Technology

The DES chatbot has increased evaluation efficiency for over 1,800 PSEs and improved processing time for over 2.9 million Arizona benefits recipients. The chatbot provides users with speedy responses, successfully answering hundreds of queries per day.

Reflecting, our team lead recalls four significant factors that differentiated this project from others. First, the policy manual was big and complex. Words and sentences in it resemble legal statutes. The bot simplifies the chore of referencing the manual with results that are 90+ percent accurate. Second, the project is unique amongst all other chatbots because it auto-trains from site content. We enhanced the content further through user feedback loop training. Third, the intuitive user interface offers multiple responses to ambiguous questions, drastically reducing the number of interactions required to find the result users are looking for. Finally, the incremental UAT cycle not only allowed us to tailor the chatbot to the end users’ expectations, it also drove user acceptance and adoption.

Feedback from DES has been overwhelmingly positive. DES Chief Information Officer Sanjiv Rastogi is optimistic, anticipating that the chatbot’s role will expand to suit the department’s future needs: “MAQ Software helped us decide on and implement a solution built on Azure with cognitive services, which gives us the grow-as-you-go infrastructure, platform, SaaS, and AI integration that DES needs.”