July 31, 2024

Streamlining information retrieval with AI copilots

 








Struggling with fragmented data sources

Our client, operating in the legal and compliance sector, required an efficient system to manage and retrieve information from vast and diverse data sources. The goal was to develop an integrated copilot platform to answer queries from various data sources. These sources included outside counsel, request for information (RFI) documents, investigation records, policy data, and more.



The issue at hand

The existing process for accessing information related to compliance, legal policies, and external counsel guidance was inefficient and time-consuming. Users had to navigate through multiple documents and platforms, leading to delays and difficulties in finding relevant information. The fragmented approach also posed challenges in maintaining up-to-date and accurate data access.


Our approach

Figure 1: Solution architecture

The objective was to streamline the process of accessing information and accelerate the department’s digital transformation by using AI and LLMs. By centralizing and automating information retrieval using Generative AI, the aim was to improve user satisfaction, reduce response times, and increase efficiency in accessing relevant documentation.

The proposed solution involved creating an integrated copilot platform using the OpenAI GPT-4 Turbo model along with Azure AI search service to implement Retrieval-Augmented Generation (RAG) as follows:

1.     Data was fetched from various data sources, processed, and indexed in Azure AI search, to be used as context.

2.     The Azure ML prompt flow used this context to generate responses via OpenAI.

3.     Users can query the copilot, which uses natural language processing (NLP), to understand and respond to queries, providing links to source documents for further reference.

 

 

Diving deeper into the solution


Figure 2: Process overview

Here are the detailed steps of our implementation:
  1. Data ingestion

a.      Ingested data files from ADLS and SharePoint sites into Parquet files using Databricks, extracting and filtering relevant columns and content.

b.      Extracted data from 22 different file formats including csv, doc, html, jpg, msg, pdf, pptx, txt, xlsx, and zip.

  1. Data preparation

a.      Cleaned data by removing signatures and noise using Python libraries.

b.      Chunked the file contents to a smaller token size for OpenAI processing.

  1. Feature extraction

a.      Used Open AI to extract features from files using prompts.

b.      Extracted and redacted PII data from the content.

c.      Detected and translated non-English content.

d.      Extracted key phrases & titles from the content.

e.      Extracted Summary data from the content.

f.        Extracted questions and answers data from the content.

  1. Search index ingestion

a.      Ingested extracted features and references into Azure Search Index.

b.      Indexes were created and the data was ingested in Azure AI Search. Indexes created include ones for question and answers, raw data, key phrases and title, and summary.

  1. ML prompt flow and RAG

a.      A prompt flow establishes a series of steps executed to generate a response.

b.      A user query is passed through Azure Content Safety to filter out inappropriate questions.

c.      The prompt flow searches multiple data indices to find relevant context based on the query.

d.      The retrieved context is passed to the OpenAI LLM model, generating a response relevant to the context. This is then displayed on the web app.

e.      This method of generating responses from a selected dataset using an AI model is known as RAG.

  1. Copilot web application

a.      The Azure web app interacts with the Prompt Flow through an ML Endpoint.

b.      It's a unified platform for accessing all copilots.

c.      When a user asks a question, the chatbot uses RAG process to generate response.

d.      User activity and feedback is stored in Application Insights. 

e.      Additional features to enhance user experience include suggested questions, dark mode, time-based searches, folder-level searches, etc.

The solution highlights

1.     Robust data ingestion and enrichment: Established pipelines to ingest data in 22 different file formats from ADLS and SharePoint sites. OpenAI’s capabilities were used to enrich the data.

2.     Advanced security measures: Implemented entitlement-based access control to safeguard sensitive data, ensuring secure and compliant data handling.

3.     Improved user interface: Introduced features like folder-level search, suggested questions, and question autocomplete to significantly improve user experience and efficiency.

4.     Responsible AI: Azure Content Safety was used to enable responsible AI.

5.     Feedback: An effective mechanism was set up to capture user feedback, allowing for continuous improvement.

 

More about the platform’s functionality

The primary purpose of the integrated copilot platform is to improve the efficiency and impact of their operations. This is done by providing accurate, timely, and relevant information to support various tasks and decision-making processes. The copilot will assist users with:

·       Internal investigations.

·       Answering general queries about the organization.

·       Providing geographical risk assessments.

·       Offering guidance from external legal firms.

·       Delivering compliance-related information.


The main copilots developed, and their capabilities, are listed below:

·       General Information Copilot: Assists users in finding answers to questions related to general company information and policies.

·       Compliance Policy Assistant Copilot: Guides users through company policies and helps them make informed decisions that align with ethical standards and legal requirements.

·       Geography Risk Assessment Copilot: Provides information about geography risk assessments and scores, helping identify residual corruption risks in different countries to inform controls, policies, and procedures.

·       External Counsel Copilot: Helps users find and access legal guidance received from external law firms, reducing expenses from repeated external legal consultations.

·       Business Regulatory Investigations Copilot: Provides users with information regarding internal investigations conducted to ensure compliance with business and regulatory standards.

·       RFI Copilot: Assists the Competition and Market Regulations team by offering a searchable repository of past responses to requests for information from regulatory agencies.



Benefits of the solution

1.      Efficient information retrieval:

Streamlined access to complex documents related to legal guidance, policies, investigations, etc. for quicker and efficient information retrieval. Time spent searching for guidance was reduced by up to 30 minutes per search.

2.      Copilot capabilities:

All the copilots together help business teams ensure compliance with business and regulatory standards and reduce external legal costs. They also assist in responding to regulatory agencies and making informed decisions about geographical business risks.

3.      Swift inquiry handling:

Immediate responses to high volumes of concurrent user inquiries help ensure bot performance and user satisfaction.

4.      Up-to-date information:

Centralized documentation with incremental pipelines ensures access to current and accurate information.

5.      Elevated user experience:

Improved user experience through UI features such as time intelligence, folder-level search, suggested questions, chat auto-complete, dark mode, etc.



The integrated copilot platform has significantly improved the efficiency of accessing and retrieving compliance and legal information. With AI and advanced data management techniques, the platform has streamlined processes, reduced response times, and empowered users with quick and accurate information. This project highlights the potential of AI-driven solutions in transforming information management in complex and data-intensive environments.


For any further inquiries, contact Sales@MAQSoftware.com to see how copilots powered by Gen AI can transform your business, improve customer satisfaction, and accelerate your delivery.