We have developed a Chatbot consisting of two AI agents and an RAG. When a user asks a question to the Chatbot, AI Agent 1 decides whether to use RAG to search for specific information or use LLMs to answer the question. The Chatbot also includes memory that can remember previous responses, making the interaction smoother.
For example, in the RAG demo, we used metadata containing annual performance data from 2019 to 2022 from Uber. If the user asks a question related to the RAG data, AI Agent 1 decides whether to use RAG or answer the question using OpenAI Agent, depending on the relevance.
In this Chatbot implementation, several engines work together, each serving a different purpose. Let's break down each part:
Chat Agent
The Chat Agent uses OpenAI as LLMs and consists of Chat Engine, Sub Question Engine, and RAG Engine. When a user asks a question, the Chat Agent decides whether to answer using RAG or Chat Engine based on the relevance to the RAG metadata description.
Sub Question Engine
If RAG is chosen for answering a question, and the prompt requires data from multiple RAG engines, the question is sent to the Sub Question Engine first. It helps break down the question before sending it to the RAG Engine, which is essential since RAG is divided into four sub-engines, each responsible for answering questions related to specific aspects.
RAG Engine
The RAG Engine uses performance data from Uber for the years 2019 to 2022. The Chat Agent decides whether to send the question directly to RAG or pass it through the Sub Question Engine based on the relevance to the RAG metadata description.
Chat Engine
The Chat Engine, or GPT Engine, answers general questions. When the Chat Agent concludes that the prompt is not related to RAG, it sends the question to the Chat Engine for a general response.
Other
This Chatbot stores conversation data in memory, allowing for smoother interaction by maintaining context.
To reset the chat or clear the memory, the /resetChat command can be used.
The /chat endpoint is used for normal queries, while /chatWithoutRAG can be used for queries without involving RAG.
User Flow
Frontend Development
User Interface Components
The UI components will mainly consist of:
Chat header: Contains settings for the chat and a button to reset the chat.
Chat input: Input for typing and sending messages.
Chat widget: Displays the conversation.
Step 1: Define Schema and Form
In this step, we will use the library react-hook-form, @hookform/resolvers/zod, and 'zod'.
In the first step, we will create a schema.
The schema consists of a query for input and a bot to display error messages for the bot itself.
export const askScheme = z.object({
query: z.string().trim().min(1, { message: 'Please enter your message' }),
bot: z.string({}).optional(),
})
Once we create the schema, we will create a type interface for this form.
From the schema itself, we will validate the input received from the user. If the user submits without typing, an error message will be displayed in the user interface saying 'Please enter your message.'
In the input component, the 'react-hook-form' library is used, along with useFormContext and Controller to manage the input.
Once we submit, if the input is validated correctly, we will see the data we logged from the onSubmit function, which we can then connect to the backend.
const onSubmit = async (data: IOpenAIForm) => {
try {
const { data: result } = await axios.post(
`/chat`,
{ query:data.query },
)
console.log(result) //Value obtained from API
//TO DO Something
} catch (error) {
const err = error as AxiosError<{ detail: string }>
setError('bot', {
message: err?.response?.data?.detail ?? 'Something went wrong',
})
}
}
When we try to submit the form, we will get the value from the API
We will then connect the obtained data to the UI of the Chatbot using React and State Management.
We will use useState from React for state management of messages to display in the UI. We will have answer and setAnswer to store the questions and answers of the user and bot. The structure of the array will be as follows:
[
{
"id": "0",
"role": "user",
"message": "What were some of the biggest risk factors in 2022 for Uber?",
"raw": ""
},
{
"id": "2",
"role": "ai",
"message": "Some of the biggest risk factors for Uber in 2022 include:\\n\\n1. Reclassification of drivers: There is a risk that drivers may be reclassified as employees or workers instead of independent contractors. This could result in increased costs for Uber, including higher wages, benefits, and potential legal liabilities.\\n\\n2. Intense competition: Uber faces intense competition in the mobility, delivery, and logistics industries. Competitors may offer similar services at lower prices or with better features, which could result in a loss of market share for Uber.\\n\\n3. Need to lower fares or service fees: To remain competitive, Uber may need to lower fares or service fees. This could impact the company's revenue and profitability.\\n\\n4. Significant losses: Uber has incurred significant losses since its inception. The company may continue to experience losses in the future, which could impact its financial stability and ability to attract investors.\\n\\n5. Uncertainty of achieving profitability: There is uncertainty regarding Uber's ability to achieve or maintain profitability. The company expects operating expenses to increase, which could make it challenging to achieve profitability in the near term.\\n\\nThese risk factors highlight the challenges and uncertainties that Uber faces in 2022.",
"raw": "Some of the biggest risk factors for Uber in 2022 include:\\n\\n1. Reclassification of drivers: There is a risk that drivers may be reclassified as employees or workers instead of independent contractors. This could result in increased costs for Uber, including higher wages, benefits, and potential legal liabilities.\\n\\n2. Intense competition: Uber faces intense competition in the mobility, delivery, and logistics industries. Competitors may offer similar services at lower prices or with better features, which could result in a loss of market share for Uber.\\n\\n3. Need to lower fares or service fees: To remain competitive, Uber may need to lower fares or service fees. This could impact the company's revenue and profitability.\\n\\n4. Significant losses: Uber has incurred significant losses since its inception. The company may continue to experience losses in the future, which could impact its financial stability and ability to attract investors.\\n\\n5. Uncertainty of achieving profitability: There is uncertainty regarding Uber's ability to achieve or maintain profitability. The company expects operating expenses to increase, which could make it challenging to achieve profitability in the near term.\\n\\nThese risk factors highlight the challenges and uncertainties that Uber faces in 2022."
}
]
We will also manage state for shooting the API. When we ask, we can choose whether to use RAG or not. We will have hasRag and setHasRag to manage the state, allowing us to use this value to check before sending the API to decide which one to shoot.
In this step, we handle various states such as loading, submitting, and errors. We use state from 'useFormContext' to display values related to the form, including isSubmitting and errors.
Create a file named app.py and initialize FastAPI:
from fastapi import FastAPI
from fastapi.encoders import jsonable_encoder
from fastapi.responses import JSONResponse
from fastapi.middleware.cors import CORSMiddleware
app = FastAPI()
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
@app.get("/helloworld")
async def helloworld():
return {"message": "Hello World"}
In the code above, we initialize a FastAPI project and enable CORS for smooth communication with the frontend. To run the server, use the following command:
uvicorn app:app
This command instructs FastAPI to execute the application defined in the app.py file, and the server will run on the default port 8000.
Next, create the data and storage folders to store sample documents and the vector database storage.
Preparation and Ingest Data
To begin, set up the environment and OpenAI key:
import os
import openai
import dotenv
from llama_hub.file.unstructured.base import UnstructuredReader
from pathlib import Path
from llama_index import VectorStoreIndex, ServiceContext, StorageContext
from llama_index import load_index_from_storage
from llama_index.tools import QueryEngineTool, ToolMetadata
from llama_index.query_engine import SubQuestionQueryEngine
from llama_index.agent import OpenAIAgent
import nest_asyncio
from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware
from fastapi.responses import JSONResponse
dotenv.load_dotenv()
openai.api_key = os.environ["OPENAI_API_KEY"]
nest_asyncio.apply()
agent = None
Continue by loading data into the VectorDB. The example uses data from raw UBER 10-K HTML files for the years 2019-2022:
def read_data(years):
loader = UnstructuredReader()
doc_set = {}
all_docs = []
for year in years:
year_docs = loader.load_data(
file=Path(f"./data/UBER/UBER_{year}.html"), split_documents=False
)
# Insert year metadata into each document
for d in year_docs:
d.metadata = {"year": year}
doc_set[year] = year_docs
all_docs.extend(year_docs)
return doc_set
Now, load the data as documents into the VectorDB, organizing it by year:
def store_data(years, doc_set, service_context):
index_set = {}
for year in years:
storage_context = StorageContext.from_defaults()
cur_index = VectorStoreIndex.from_documents(
doc_set[year],
service_context=service_context,
storage_context=storage_context,
)
index_set[year] = cur_index
storage_context.persist(persist_dir=f"./storage/{year}")
return index_set
Setting Up a Sub Question Query Engine
Create a Query Engine for each year's data by loading the index from the VectorDB:
def load_data(years, service_context):
index_set = {}
for year in years:
storage_context = StorageContext.from_defaults(
persist_dir=f"./storage/{year}"
)
cur_index = load_index_from_storage(
storage_context, service_context=service_context
)
index_set[year] = cur_index
return index_set
Generate Query Engine Tools for each year's data:
def create_individual_query_tool(index_set, years):
individual_query_engine_tools = [
QueryEngineTool(
query_engine=index_set[year].as_query_engine(),
metadata=ToolMetadata(
name=f"vector_index_{year}",
description=f"useful for when you want to answer queries about the {year} SEC 10-K for Uber",
),
)
for year in years
]
return individual_query_engine_tools
Synthesize Answers Across the Data
Create a function to synthesize questions for individual query engine tools:
Generate a Query Engine Tool for sub-question query engine:
def create_sub_question_tool(query_engine):
query_engine_tool = QueryEngineTool(
query_engine=query_engine,
metadata=ToolMetadata(
name="sub_question_query_engine",
description="useful for when you want to answer queries that require analyzing multiple SEC 10-K documents for Uber",
),
)
return query_engine_tool
Create General Engine
The final engine we are going to create will serve as a query tool used to search for information that is not within the scope of the prepared data. Alternatively, it can be referred to as a chatbot for answering general questions.
def agent_chat():
chat_engine_tool = [
QueryEngineTool(
query_engine=OpenAIAgent.from_tools([]),
metadata=ToolMetadata(
name="gpt_agent", description="Agent that can answer general questions."
),
),
]
return chat_engine_tool
Create OpenAI Agent from Tools
Combine all query engine tools into an OpenAI Agent:
Implement an endpoint for processing chat queries:
@app.post('/chat')
def chat(query: str = 'What were some of the biggest risk factors in 2022 for Uber?'):
global agent
years = [2022, 2021, 2020, 2019]
service_context = ServiceContext.from_defaults(chunk_size=512)
index_set = load_data(years, service_context)
individual_query_engine_tools = create_individual_query_tool(index_set, years)
query_engine = create_synthesizer(individual_query_engine_tools, service_context)
query_engine_tool = create_sub_question_tool(query_engine)
gpt_agent = agent_chat()
if agent is None:
agent = build_chat_engine(individual_query_engine_tools, query_engine_tool, gpt_agent)
answer = agent.chat(query)
return JSONResponse({'answer': str(answer)})
Create Utility Endpoint
Implement two additional APIs for utility purposes:
Reset the Agent data:
@app.post('/resetChat')
def resetChat():
global agent
agent.reset()
return JSONResponse({'status': 'complete'})
Use a chatbot without RAG:
@app.post('/chatWithoutRAG')
def chatWithoutRAG(query: str = 'What were some of the biggest risk factors in 2022 for Uber?'):
gpt_agent = agent_chat()
agent = OpenAIAgent.from_tools(gpt_agent, verbose=False)
answer = agent.chat(query)
return JSONResponse({'answer': str(answer)})
Deploying and Monitoring on EC2
To deploy FastAPI on an EC2 instance:
Create a session using screen:
screen -S name
Navigate to the API folder:
cd path/to/api
Start FastAPI on port 8000:
uvicorn app:app --host 0.0.0.0
Detach from the screen session:
Ctrl+a d
Now, the server runs in the background even after exiting the session.
Setting Up API Gateway and Implementing CORS
To integrate with AWS API Gateway for better management and usage control, add CORS configuration to FastAPI: