We have developed a Chatbot consisting of two AI agents and an RAG. When a user asks a question to the Chatbot, AI Agent 1 decides whether to use RAG to search for specific information or use LLMs to answer the question. The Chatbot also includes memory that can remember previous responses, making the interaction smoother.
For example, in the RAG demo, we used metadata containing annual performance data from 2019 to 2022 from Uber. If the user asks a question related to the RAG data, AI Agent 1 decides whether to use RAG or answer the question using OpenAI Agent, depending on the relevance.
In this Chatbot implementation, several engines work together, each serving a different purpose. Let's break down each part:
Chat Agent
The Chat Agent uses OpenAI as LLMs and consists of Chat Engine, Sub Question Engine, and RAG Engine. When a user asks a question, the Chat Agent decides whether to answer using RAG or Chat Engine based on the relevance to the RAG metadata description.
Sub Question Engine
If RAG is chosen for answering a question, and the prompt requires data from multiple RAG engines, the question is sent to the Sub Question Engine first. It helps break down the question before sending it to the RAG Engine, which is essential since RAG is divided into four sub-engines, each responsible for answering questions related to specific aspects.
RAG Engine
The RAG Engine uses performance data from Uber for the years 2019 to 2022. The Chat Agent decides whether to send the question directly to RAG or pass it through the Sub Question Engine based on the relevance to the RAG metadata description.
Chat Engine
The Chat Engine, or GPT Engine, answers general questions. When the Chat Agent concludes that the prompt is not related to RAG, it sends the question to the Chat Engine for a general response.
Other
This Chatbot stores conversation data in memory, allowing for smoother interaction by maintaining context.
To reset the chat or clear the memory, the /resetChat command can be used.
The /chat endpoint is used for normal queries, while /chatWithoutRAG can be used for queries without involving RAG.
User Flow
Frontend Development
User Interface Components
The UI components will mainly consist of:
Chat header: Contains settings for the chat and a button to reset the chat.
Chat input: Input for typing and sending messages.
Chat widget: Displays the conversation.
Step 1: Define Schema and Form
In this step, we will use the library react-hook-form, @hookform/resolvers/zod, and 'zod'.
In the first step, we will create a schema.
The schema consists of a query for input and a bot to display error messages for the bot itself.
exportconstaskScheme=z.object({ query:z.string().trim().min(1, { message:'Please enter your message' }), bot:z.string({}).optional(),})
Once we create the schema, we will create a type interface for this form.
From the schema itself, we will validate the input received from the user. If the user submits without typing, an error message will be displayed in the user interface saying 'Please enter your message.'
In the input component, the 'react-hook-form' library is used, along with useFormContext and Controller to manage the input.
Once we submit, if the input is validated correctly, we will see the data we logged from the onSubmit function, which we can then connect to the backend.
constonSubmit=async (data:IOpenAIForm) => {try {const { data: result } =awaitaxios.post(`/chat`, { query:data.query }, )console.log(result) //Value obtained from API//TO DO Something } catch (error) {consterr= error asAxiosError<{ detail:string }>setError('bot', { message:err?.response?.data?.detail ??'Something went wrong', }) }}
When we try to submit the form, we will get the value from the API
We will then connect the obtained data to the UI of the Chatbot using React and State Management.
We will use useState from React for state management of messages to display in the UI. We will have answer and setAnswer to store the questions and answers of the user and bot. The structure of the array will be as follows:
[ {"id":"0","role":"user","message":"What were some of the biggest risk factors in 2022 for Uber?","raw":"" }, {"id":"2","role":"ai", "message": "Some of the biggest risk factors for Uber in 2022 include:\\n\\n1. Reclassification of drivers: There is a risk that drivers may be reclassified as employees or workers instead of independent contractors. This could result in increased costs for Uber, including higher wages, benefits, and potential legal liabilities.\\n\\n2. Intense competition: Uber faces intense competition in the mobility, delivery, and logistics industries. Competitors may offer similar services at lower prices or with better features, which could result in a loss of market share for Uber.\\n\\n3. Need to lower fares or service fees: To remain competitive, Uber may need to lower fares or service fees. This could impact the company's revenue and profitability.\\n\\n4. Significant losses: Uber has incurred significant losses since its inception. The company may continue to experience losses in the future, which could impact its financial stability and ability to attract investors.\\n\\n5. Uncertainty of achieving profitability: There is uncertainty regarding Uber's ability to achieve or maintain profitability. The company expects operating expenses to increase, which could make it challenging to achieve profitability in the near term.\\n\\nThese risk factors highlight the challenges and uncertainties that Uber faces in 2022.",
"raw": "Some of the biggest risk factors for Uber in 2022 include:\\n\\n1. Reclassification of drivers: There is a risk that drivers may be reclassified as employees or workers instead of independent contractors. This could result in increased costs for Uber, including higher wages, benefits, and potential legal liabilities.\\n\\n2. Intense competition: Uber faces intense competition in the mobility, delivery, and logistics industries. Competitors may offer similar services at lower prices or with better features, which could result in a loss of market share for Uber.\\n\\n3. Need to lower fares or service fees: To remain competitive, Uber may need to lower fares or service fees. This could impact the company's revenue and profitability.\\n\\n4. Significant losses: Uber has incurred significant losses since its inception. The company may continue to experience losses in the future, which could impact its financial stability and ability to attract investors.\\n\\n5. Uncertainty of achieving profitability: There is uncertainty regarding Uber's ability to achieve or maintain profitability. The company expects operating expenses to increase, which could make it challenging to achieve profitability in the near term.\\n\\nThese risk factors highlight the challenges and uncertainties that Uber faces in 2022."
}]
We will also manage state for shooting the API. When we ask, we can choose whether to use RAG or not. We will have hasRag and setHasRag to manage the state, allowing us to use this value to check before sending the API to decide which one to shoot.
In this step, we handle various states such as loading, submitting, and errors. We use state from 'useFormContext' to display values related to the form, including isSubmitting and errors.
In the code above, we initialize a FastAPI project and enable CORS for smooth communication with the frontend. To run the server, use the following command:
uvicornapp:app
This command instructs FastAPI to execute the application defined in the app.py file, and the server will run on the default port 8000.
Next, create the data and storage folders to store sample documents and the vector database storage.
Continue by loading data into the VectorDB. The example uses data from raw UBER 10-K HTML files for the years 2019-2022:
defread_data(years): loader =UnstructuredReader() doc_set ={} all_docs = []for year in years: year_docs = loader.load_data( file=Path(f"./data/UBER/UBER_{year}.html"), split_documents=False )# Insert year metadata into each documentfor d in year_docs: d.metadata ={"year": year} doc_set[year]= year_docs all_docs.extend(year_docs)return doc_set
Now, load the data as documents into the VectorDB, organizing it by year:
defstore_data(years,doc_set,service_context): index_set ={}for year in years: storage_context = StorageContext.from_defaults() cur_index = VectorStoreIndex.from_documents( doc_set[year], service_context=service_context, storage_context=storage_context, ) index_set[year]= cur_index storage_context.persist(persist_dir=f"./storage/{year}")return index_set
Setting Up a Sub Question Query Engine
Create a Query Engine for each year's data by loading the index from the VectorDB:
defload_data(years,service_context): index_set ={}for year in years: storage_context = StorageContext.from_defaults( persist_dir=f"./storage/{year}" ) cur_index =load_index_from_storage( storage_context, service_context=service_context ) index_set[year]= cur_indexreturn index_set
Generate Query Engine Tools for each year's data:
defcreate_individual_query_tool(index_set,years): individual_query_engine_tools = [QueryEngineTool( query_engine=index_set[year].as_query_engine(), metadata=ToolMetadata( name=f"vector_index_{year}", description=f"useful for when you want to answer queries about the {year} SEC 10-K for Uber", ), )for year in years ]return individual_query_engine_tools
Synthesize Answers Across the Data
Create a function to synthesize questions for individual query engine tools:
Generate a Query Engine Tool for sub-question query engine:
defcreate_sub_question_tool(query_engine): query_engine_tool =QueryEngineTool( query_engine=query_engine, metadata=ToolMetadata( name="sub_question_query_engine", description="useful for when you want to answer queries that require analyzing multiple SEC 10-K documents for Uber",
), )return query_engine_tool
Create General Engine
The final engine we are going to create will serve as a query tool used to search for information that is not within the scope of the prepared data. Alternatively, it can be referred to as a chatbot for answering general questions.
defagent_chat(): chat_engine_tool = [QueryEngineTool( query_engine=OpenAIAgent.from_tools([]), metadata=ToolMetadata( name="gpt_agent", description="Agent that can answer general questions." ), ), ]return chat_engine_tool
Create OpenAI Agent from Tools
Combine all query engine tools into an OpenAI Agent:
@app.post('/chatWithoutRAG')defchatWithoutRAG(query:str='What were some of the biggest risk factors in 2022 for Uber?'): gpt_agent =agent_chat() agent = OpenAIAgent.from_tools(gpt_agent, verbose=False) answer = agent.chat(query)returnJSONResponse({'answer': str(answer)})
Deploying and Monitoring on EC2
To deploy FastAPI on an EC2 instance:
Create a session using screen:
screen-Sname
Navigate to the API folder:
cdpath/to/api
Start FastAPI on port 8000:
uvicornapp:app--host0.0.0.0
Detach from the screen session:
Ctrl+ad
Now, the server runs in the background even after exiting the session.
Setting Up API Gateway and Implementing CORS
To integrate with AWS API Gateway for better management and usage control, add CORS configuration to FastAPI:
from fastapi.middleware.cors import CORSMiddlewareapp.app =FastAPI()app.add_middleware( CORSMiddleware, allow_origins=["*"], allow_credentials=True, allow_methods=["*"], allow_headers=["*"],)
Connect this server to AWS API Gateway to handle authentication and usage management.