Struggling with fragmented data sources
Our client, operating in the legal and compliance sector, required an efficient system to manage and retrieve information from vast and diverse data sources. The goal was to develop an integrated copilot platform to answer queries from various data sources. These sources included outside counsel, request for information (RFI) documents, investigation records, policy data, and more.
The issue at hand
Our approach
1. Data
was fetched from various data sources, processed,
and indexed in Azure AI search, to be used as context.
2. The
Azure ML prompt flow used this context to
generate responses via OpenAI.
3. Users can query the copilot, which uses natural language processing (NLP), to understand and respond to queries, providing links to source documents for further reference.
Diving deeper into the solution
Figure 2: Process overview |
Here are the detailed steps of our implementation:
- Data
ingestion
a.
Ingested
data files from ADLS and SharePoint sites into Parquet files using Databricks,
extracting and filtering relevant columns and content.
b.
Extracted
data from 22 different file formats including csv, doc, html, jpg, msg, pdf,
pptx, txt, xlsx, and zip.
- Data
preparation
a.
Cleaned
data by removing signatures and noise using Python libraries.
b.
Chunked
the file contents to a smaller token size for OpenAI processing.
- Feature
extraction
a.
Used
Open AI to extract features from files using prompts.
b.
Extracted
and redacted PII data from the content.
c.
Detected
and translated non-English content.
d.
Extracted
key phrases & titles from the content.
e.
Extracted
Summary data from the content.
f.
Extracted questions and answers data from the
content.
- Search
index ingestion
a.
Ingested
extracted features and references into Azure Search Index.
b.
Indexes
were created and the data was ingested in Azure AI Search. Indexes created
include ones for question and answers, raw data, key phrases and title, and summary.
- ML prompt
flow and RAG
a.
A prompt flow establishes a series of steps
executed to generate a response.
b.
A user query is passed through Azure Content
Safety to filter out inappropriate questions.
c.
The prompt flow searches multiple data indices
to find relevant context based on the query.
d.
The retrieved context is passed to the OpenAI
LLM model, generating a response relevant to the context. This is then
displayed on the web app.
e.
This method of generating responses from a
selected dataset using an AI model is known as RAG.
- Copilot
web application
a.
The
Azure web app interacts with the Prompt Flow through an ML Endpoint.
b.
It's
a unified platform for accessing all copilots.
c.
When
a user asks a question, the chatbot uses RAG process to generate response.
d.
User
activity and feedback is stored in Application Insights.
e.
Additional
features to enhance user experience include suggested questions, dark mode,
time-based searches, folder-level searches, etc.
- Data
ingestion
a.
Ingested
data files from ADLS and SharePoint sites into Parquet files using Databricks,
extracting and filtering relevant columns and content.
b.
Extracted
data from 22 different file formats including csv, doc, html, jpg, msg, pdf,
pptx, txt, xlsx, and zip.
- Data
preparation
a.
Cleaned
data by removing signatures and noise using Python libraries.
b.
Chunked
the file contents to a smaller token size for OpenAI processing.
- Feature
extraction
a.
Used
Open AI to extract features from files using prompts.
b.
Extracted
and redacted PII data from the content.
c.
Detected
and translated non-English content.
d.
Extracted
key phrases & titles from the content.
e.
Extracted
Summary data from the content.
f.
Extracted questions and answers data from the
content.
- Search
index ingestion
a.
Ingested
extracted features and references into Azure Search Index.
b.
Indexes
were created and the data was ingested in Azure AI Search. Indexes created
include ones for question and answers, raw data, key phrases and title, and summary.
- ML prompt
flow and RAG
a.
A prompt flow establishes a series of steps
executed to generate a response.
b.
A user query is passed through Azure Content
Safety to filter out inappropriate questions.
c.
The prompt flow searches multiple data indices
to find relevant context based on the query.
d.
The retrieved context is passed to the OpenAI
LLM model, generating a response relevant to the context. This is then
displayed on the web app.
e.
This method of generating responses from a
selected dataset using an AI model is known as RAG.
- Copilot
web application
a.
The
Azure web app interacts with the Prompt Flow through an ML Endpoint.
b.
It's
a unified platform for accessing all copilots.
c.
When
a user asks a question, the chatbot uses RAG process to generate response.
d.
User
activity and feedback is stored in Application Insights.
e. Additional features to enhance user experience include suggested questions, dark mode, time-based searches, folder-level searches, etc.
The solution highlights1. Robust data ingestion and enrichment:
Established pipelines to ingest data in 22 different file formats from ADLS and
SharePoint sites. OpenAI’s capabilities were used to enrich the data. 2. Advanced security measures: Implemented
entitlement-based access control to safeguard sensitive data, ensuring secure
and compliant data handling. 3. Improved
user interface: Introduced features like folder-level search, suggested
questions, and question autocomplete to significantly improve user experience
and efficiency. 4. Responsible AI: Azure Content Safety was used to enable responsible AI. 5. Feedback:
An effective mechanism was set up to capture user feedback, allowing for continuous
improvement. More about the platform’s functionalityThe primary purpose of the integrated copilot platform is to improve the efficiency and impact of their operations. This is done by providing accurate, timely, and relevant information to support various tasks and decision-making processes. The copilot will assist users with:
· Internal investigations. · Answering general queries about the organization. · Providing geographical risk assessments. · Offering guidance from external legal firms. · Delivering compliance-related information. The main copilots developed, and their capabilities, are listed below: · General Information Copilot: Assists users in finding
answers to questions related to general company information and policies. · Compliance Policy Assistant Copilot: Guides users through company policies and helps them make informed
decisions that align with ethical standards and legal requirements. · Geography Risk Assessment Copilot: Provides information about geography risk assessments and scores,
helping identify residual corruption risks in different countries to inform
controls, policies, and procedures. · External Counsel Copilot: Helps users find and
access legal guidance received from external law firms, reducing expenses from
repeated external legal consultations. · Business Regulatory Investigations Copilot: Provides users with information regarding internal investigations
conducted to ensure compliance with business and regulatory standards. · RFI Copilot: Assists the Competition and Market Regulations team by offering a searchable repository of past responses to requests for information from regulatory agencies. Benefits of the solution1. Efficient
information retrieval: Streamlined access to complex
documents related to legal guidance, policies, investigations, etc. for quicker
and efficient information retrieval. Time spent searching for guidance was
reduced by up to 30 minutes per search. 2.
Copilot capabilities: All the copilots together help
business teams ensure compliance with business and regulatory standards and
reduce external legal costs. They also assist in responding to regulatory
agencies and making informed decisions about geographical business risks. 3. Swift
inquiry handling: Immediate responses to high
volumes of concurrent user inquiries help ensure bot performance and user satisfaction. 4. Up-to-date
information: Centralized documentation with
incremental pipelines ensures access to current and accurate information. 5.
Elevated user experience: Improved
user experience through UI features such as time intelligence, folder-level
search, suggested questions, chat auto-complete, dark mode, etc. The integrated copilot platform has significantly improved the efficiency of accessing and retrieving compliance and legal information. With AI and advanced data management techniques, the platform has streamlined processes, reduced response times, and empowered users with quick and accurate information. This project highlights the potential of AI-driven solutions in transforming information management in complex and data-intensive environments. For any further inquiries, contact Sales@MAQSoftware.com to see how copilots powered by Gen AI can transform your
business, improve customer satisfaction, and accelerate your delivery.
|