Q&A Chatbots (RAG + Agents)

We have developed a Chatbot consisting of two AI agents and an RAG. When a user asks a question to the Chatbot, AI Agent 1 decides whether to use RAG to search for specific information or use LLMs to answer the question. The Chatbot also includes memory that can remember previous responses, making the interaction smoother.

For example, in the RAG demo, we used metadata containing annual performance data from 2019 to 2022 from Uber. If the user asks a question related to the RAG data, AI Agent 1 decides whether to use RAG or answer the question using OpenAI Agent, depending on the relevance.

In this Chatbot implementation, several engines work together, each serving a different purpose. Let's break down each part:

Chat Agent

The Chat Agent uses OpenAI as LLMs and consists of Chat Engine, Sub Question Engine, and RAG Engine. When a user asks a question, the Chat Agent decides whether to answer using RAG or Chat Engine based on the relevance to the RAG metadata description.

Sub Question Engine

If RAG is chosen for answering a question, and the prompt requires data from multiple RAG engines, the question is sent to the Sub Question Engine first. It helps break down the question before sending it to the RAG Engine, which is essential since RAG is divided into four sub-engines, each responsible for answering questions related to specific aspects.

RAG Engine

The RAG Engine uses performance data from Uber for the years 2019 to 2022. The Chat Agent decides whether to send the question directly to RAG or pass it through the Sub Question Engine based on the relevance to the RAG metadata description.

Chat Engine

The Chat Engine, or GPT Engine, answers general questions. When the Chat Agent concludes that the prompt is not related to RAG, it sends the question to the Chat Engine for a general response.

Other

  • This Chatbot stores conversation data in memory, allowing for smoother interaction by maintaining context.

  • To reset the chat or clear the memory, the /resetChat command can be used.

  • The /chat endpoint is used for normal queries, while /chatWithoutRAG can be used for queries without involving RAG.

User Flow

Frontend Development

User Interface Components

The UI components will mainly consist of:

  • Chat header: Contains settings for the chat and a button to reset the chat.

  • Chat input: Input for typing and sending messages.

  • Chat widget: Displays the conversation.

Step 1: Define Schema and Form

In this step, we will use the library react-hook-form, @hookform/resolvers/zod, and 'zod'.

In the first step, we will create a schema.

The schema consists of a query for input and a bot to display error messages for the bot itself.

export const askScheme = z.object({
  query: z.string().trim().min(1, { message: 'Please enter your message' }),
  bot: z.string({}).optional(),
})

Once we create the schema, we will create a type interface for this form.

export interface IOpenAIForm extends z.infer<typeof askScheme> {}

After that, we will create methods.

const methods = useForm<IOpenAIForm>({
  resolver: zodResolver(askScheme),
  shouldFocusError: true,
  defaultValues: {
    query: '',
  },
})

const { handleSubmit, setError, setValue } = methods

const onSubmit = async (data: IOpenAIForm) => {
  ...TO DO Something
}

return (
  <FormProvider methods={methods} onSubmit={handleSubmit(onSubmit)}>
    ...
  </FormProvider>
)

From the schema itself, we will validate the input received from the user. If the user submits without typing, an error message will be displayed in the user interface saying 'Please enter your message.'

In the input component, the 'react-hook-form' library is used, along with useFormContext and Controller to manage the input.

import { InputHTMLAttributes } from 'react'
import { useFormContext, Controller } from 'react-hook-form'
import { twMerge } from 'tailwind-merge'

interface IProps extends InputHTMLAttributes<HTMLInputElement> {
  name: string
  helperText?: string
  label?: string
}

const RHFTextField = ({
  name,
  helperText,
  label,
  className,
  ...other
}: IProps) => {
  const { control } = useFormContext()

  return (
    <Controller name={name}
      control={control}
      render={({ field, fieldState: { error } }) => (
        <div className='w-full flex flex-col gap-y-2 '>
          {label && <label className='text-sm font-semibold'>{label}</label>}
          <input
            {...field}
            {...other}
            className={twMerge(
              'outline-none w-full  border border-gray-200 rounded-lg px-2 py-1',
              className
            )}
          />
          {(!!error || helperText) && (
            <div className={twMerge(error?.message && 'text-rose-500 text-sm')}>
              {error?.message || helperText}
            </div>
          )}
        </div>
      )}
    />
  )
}
export default RHFTextField

In the part where the input is called, we will set the attribute name to be 'query,' similar to the variable declared in that schema.

const methods = useForm<IOpenAIForm>({
  resolver: zodResolver(askScheme),
  shouldFocusError: true,
  defaultValues: {
    query: '',
  },
})

const { handleSubmit, setError, setValue } = methods

const onSubmit = async (data: IOpenAIForm) => {
  console.log(data)
}

return (
  <FormProvider methods={methods} onSubmit={handleSubmit(onSubmit)}>
      ....
     <RHFTextField type='text'
        placeholder='What do you need ? ...'
        className='outline-none w-full border-none'
        name='query'
      />
      <button type='submit'
        disabled={isLoading}
        className='text-gray-400 disabled:text-gray-200'
      >Submit</button>
  </FormProvider>
)

Once we submit, if the input is validated correctly, we will see the data we logged from the onSubmit function, which we can then connect to the backend.

Step 2: Connect Backend

We will set the default base URL.

axios.defaults.baseURL = process.env.NEXT_PUBLIC_API

After setting it up, we will connect to the API.

const onSubmit = async (data: IOpenAIForm) => {
  try {

    const { data: result } = await axios.post(
      `/chat`,
      { query:data.query },
    )
    console.log(result) //Value obtained from API
    //TO DO Something
  } catch (error) {
    const err = error as AxiosError<{ detail: string }>
    setError('bot', {
      message: err?.response?.data?.detail ?? 'Something went wrong',
    })
  }
}

When we try to submit the form, we will get the value from the API

We will then connect the obtained data to the UI of the Chatbot using React and State Management.

We will use useState from React for state management of messages to display in the UI. We will have answer and setAnswer to store the questions and answers of the user and bot. The structure of the array will be as follows:

[
    {
        "id": "0",
        "role": "user",
        "message": "What were some of the biggest risk factors in 2022 for Uber?",
        "raw": ""
    },
    {
        "id": "2",
        "role": "ai",
        "message": "Some of the biggest risk factors for Uber in 2022 include:\\n\\n1. Reclassification of drivers: There is a risk that drivers may be reclassified as employees or workers instead of independent contractors. This could result in increased costs for Uber, including higher wages, benefits, and potential legal liabilities.\\n\\n2. Intense competition: Uber faces intense competition in the mobility, delivery, and logistics industries. Competitors may offer similar services at lower prices or with better features, which could result in a loss of market share for Uber.\\n\\n3. Need to lower fares or service fees: To remain competitive, Uber may need to lower fares or service fees. This could impact the company's revenue and profitability.\\n\\n4. Significant losses: Uber has incurred significant losses since its inception. The company may continue to experience losses in the future, which could impact its financial stability and ability to attract investors.\\n\\n5. Uncertainty of achieving profitability: There is uncertainty regarding Uber's ability to achieve or maintain profitability. The company expects operating expenses to increase, which could make it challenging to achieve profitability in the near term.\\n\\nThese risk factors highlight the challenges and uncertainties that Uber faces in 2022.",
        "raw": "Some of the biggest risk factors for Uber in 2022 include:\\n\\n1. Reclassification of drivers: There is a risk that drivers may be reclassified as employees or workers instead of independent contractors. This could result in increased costs for Uber, including higher wages, benefits, and potential legal liabilities.\\n\\n2. Intense competition: Uber faces intense competition in the mobility, delivery, and logistics industries. Competitors may offer similar services at lower prices or with better features, which could result in a loss of market share for Uber.\\n\\n3. Need to lower fares or service fees: To remain competitive, Uber may need to lower fares or service fees. This could impact the company's revenue and profitability.\\n\\n4. Significant losses: Uber has incurred significant losses since its inception. The company may continue to experience losses in the future, which could impact its financial stability and ability to attract investors.\\n\\n5. Uncertainty of achieving profitability: There is uncertainty regarding Uber's ability to achieve or maintain profitability. The company expects operating expenses to increase, which could make it challenging to achieve profitability in the near term.\\n\\nThese risk factors highlight the challenges and uncertainties that Uber faces in 2022."
    }
]

We will also manage state for shooting the API. When we ask, we can choose whether to use RAG or not. We will have hasRag and setHasRag to manage the state, allowing us to use this value to check before sending the API to decide which one to shoot.

import ChatWidget, { ChatProps } from '@/app/components/ChatWidget'
import FormProvider from '@/app/components/hook-form/FormProvider'
import { zodResolver } from '@hookform/resolvers/zod'
import axios, { AxiosError } from 'axios'
import { useState } from 'react'
import { useForm } from 'react-hook-form'
import { z } from 'zod'

export const askScheme = z.object({
  query: z.string().trim().min(1, { message: 'Please enter your message' }),
  bot: z.string({}).optional(),
})

export enum ChatType {
  Basic,
  WithoutRag,
}

axios.defaults.baseURL = process.env.NEXT_PUBLIC_API

export interface IOpenAIForm extends z.infer<typeof askScheme> {}

export default function ChatBotDemo() {
  const [answer, setAnswer] = useState<ChatProps[]>([])
  const [hasRag, setHasRag] = useState(true)

  const methods = useForm<IOpenAIForm>({
    resolver: zodResolver(askScheme),
    shouldFocusError: true,
    defaultValues: {
      query: '',
    },
  })

  const { handleSubmit, setError, setValue } = methods

  const onSubmit = async (data: IOpenAIForm) => {
    try {
      const id = answer.length
      setAnswer((prevState) => [
        ...prevState,
        {
          id: id.toString(),
          role: 'user',
          message: data.query,
          raw: '',
        },
      ])
      setValue('query', '')
      const { data: result } = await axios.post(
        `${hasRag ? '/chat' : '/chatWithoutRAG'}`,
        {
          query: data.query,
        }
      )
      setAnswer((prevState) => [
        ...prevState,
        {
          id: (prevState.length + 1).toString(),
          role: 'ai',
          message: result.answer,
          raw: result.answer,
        },
      ])
    } catch (error) {
      const err = error as AxiosError<{ detail: string }>
      setError('bot', {
        message: err?.response?.data?.detail ?? 'Something went wrong',
      })
    }
  }
  const handleChangeRag = () => {
    setHasRag(!hasRag)
  }

  return (
    <FormProvider methods={methods} onSubmit={handleSubmit(onSubmit)}>
      <div className='flex justify-center flex-col items-center bg-white mx-auto max-w-7xl h-screen '>
        <ChatWidget answer={answer} option={{ hasRag, handleChangeRag }} />
      </div>
    </FormProvider>
  )
}

Step 3: Handle Form Submission

In this step, we handle various states such as loading, submitting, and errors. We use state from 'useFormContext' to display values related to the form, including isSubmitting and errors.

export default function ChatWidget({ answer }: { answer: ChatProps[] }) {
  const chatWindowRef = useRef<HTMLDivElement>(null)
  const {
    formState: { isSubmitting, errors },
  } = useFormContext()

  return (
    <div className='h-full flex flex-col w-full'>
      <Header />
      <ChatWindow messages={answer} isLoading={isSubmitting} error={errors?.bot?.message as string} chatWindowRef={chatWindowRef} />
      <ChatInput isLoading={isSubmitting} />
    </div>
  )
}

In the ChatWindow component, we manage various states to display in the UI, including loading, submitting, and error states.

import { useEffect } from 'react'
import Image from 'next/image'
import { CopyClipboard } from './CopyClipboard'

interface Message {
  role: 'user' | 'ai'
  message: string
  id: string
  raw: string
}

interface ChatWindowProps {
  messages: Message[]
  isLoading?: boolean
  error?: string
  chatWindowRef: any | null
}

export const ChatWindow: React.FC<ChatWindowProps> = ({
  messages,
  isLoading,
  error,
  chatWindowRef,
}) => {
  useEffect(() => {
    if (
      chatWindowRef !== null &&
      chatWindowRef?.current &&
      messages.length > 0
    ) {
      chatWindowRef.current.scrollTop = chatWindowRef.current.scrollHeight
    }
  }, [messages.length, chatWindowRef])

  return (
    <divref={chatWindowRef}
      className='flex-1 overflow-y-auto p-4 space-y-8'
      id='chatWindow'
    >
      {messages.map((item, index) => (
        <div key={item.id} className='w-full'>
          {item.role === 'user' ? (
            <div className='flex gap-x-8 '>
              <div className='min-w-[48px] min-h-[48px]'>
                <Imagesrc='/img/chicken.png'
                  width={48}
                  height={48}
                  alt='user'
                />
              </div>
              <div>
                <p className='font-bold'>User</p>
                <p>{item.message}</p>
              </div>
            </div>
          ) : (
            <div className='flex gap-x-8 w-full'>
              <div className='min-w-[48px] min-h-[48px]'>
                <Imagesrc='/img/robot.png'
                  width={48}
                  height={48}
                  alt='robot'
                />
              </div>
              <div className='w-full'>
                <div className='flex justify-between mb-1 w-full '>
                  <p className='font-bold'>Ai</p>
                  <div />
                  <CopyClipboard content={item.raw} />
                </div>

                <divclassName='prose whitespace-pre-line'
                  dangerouslySetInnerHTML={{ __html: item.message }}
                />
              </div>
            </div>
          )}
        </div>
      ))}
      {isLoading && (
        <div className='flex gap-x-8 w-full mx-auto'>
          <div className='min-w-[48px] min-h-[48px]'>
            <Image src='/img/robot.png' width={48} height={48} alt='robot' />
          </div>
          <div>
            <p className='font-bold'>Ai</p>

            <div className='mt-4 flex space-x-2 items-center '>
              <p>Hang on a second </p>
              <span className='sr-only'>Loading...</span>
              <div className='h-2 w-2 bg-blue-600 rounded-full animate-bounce [animation-delay:-0.3s]'></div>
              <div className='h-2 w-2 bg-blue-600 rounded-full animate-bounce [animation-delay:-0.15s]'></div>
              <div className='h-2 w-2 bg-blue-600 rounded-full animate-bounce'></div>
            </div>
          </div>
        </div>
      )}
      {error && (
        <div className='flex gap-x-8 w-full mx-auto'>
          <div className='min-w-[48px] min-h-[48px]'>
            <Image src='/img/error.png' width={48} height={48} alt='error' />
          </div>
          <div>
            <p className='font-bold'>Ai</p>
            <p className='text-rose-500'>{error}</p>
          </div>
        </div>
      )}
    </div>
  )
}

Link to Frontend Code

Backend Development

Setting Up a FastAPI Project

Similar to all the previous examples, we choose to use FastAPI as the framework to build an API for our application.

First, install the necessary Python libraries for FastAPI:

pip install fastapi
pip install "uvicorn[standard]"

Create a file named app.py and initialize FastAPI:

from fastapi import FastAPI
from fastapi.encoders import jsonable_encoder
from fastapi.responses import JSONResponse
from fastapi.middleware.cors import CORSMiddleware

app = FastAPI()

app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)

@app.get("/helloworld")
async def helloworld():
    return {"message": "Hello World"}

In the code above, we initialize a FastAPI project and enable CORS for smooth communication with the frontend. To run the server, use the following command:

uvicorn app:app

This command instructs FastAPI to execute the application defined in the app.py file, and the server will run on the default port 8000.

Next, create the data and storage folders to store sample documents and the vector database storage.

Preparation and Ingest Data

To begin, set up the environment and OpenAI key:

import os
import openai
import dotenv
from llama_hub.file.unstructured.base import UnstructuredReader
from pathlib import Path
from llama_index import VectorStoreIndex, ServiceContext, StorageContext
from llama_index import load_index_from_storage
from llama_index.tools import QueryEngineTool, ToolMetadata
from llama_index.query_engine import SubQuestionQueryEngine
from llama_index.agent import OpenAIAgent
import nest_asyncio
from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware
from fastapi.responses import JSONResponse

dotenv.load_dotenv()
openai.api_key = os.environ["OPENAI_API_KEY"]
nest_asyncio.apply()
agent = None

Continue by loading data into the VectorDB. The example uses data from raw UBER 10-K HTML files for the years 2019-2022:

def read_data(years):
    loader = UnstructuredReader()
    doc_set = {}
    all_docs = []

    for year in years:
        year_docs = loader.load_data(
            file=Path(f"./data/UBER/UBER_{year}.html"), split_documents=False
        )
        # Insert year metadata into each document
        for d in year_docs:
            d.metadata = {"year": year}
        doc_set[year] = year_docs
        all_docs.extend(year_docs)

    return doc_set

Now, load the data as documents into the VectorDB, organizing it by year:

def store_data(years, doc_set, service_context):
    index_set = {}

    for year in years:
        storage_context = StorageContext.from_defaults()
        cur_index = VectorStoreIndex.from_documents(
            doc_set[year],
            service_context=service_context,
            storage_context=storage_context,
        )
        index_set[year] = cur_index
        storage_context.persist(persist_dir=f"./storage/{year}")

    return index_set

Setting Up a Sub Question Query Engine

Create a Query Engine for each year's data by loading the index from the VectorDB:

def load_data(years, service_context):
    index_set = {}

    for year in years:
        storage_context = StorageContext.from_defaults(
            persist_dir=f"./storage/{year}"
        )
        cur_index = load_index_from_storage(
            storage_context, service_context=service_context
        )
        index_set[year] = cur_index

    return index_set

Generate Query Engine Tools for each year's data:

def create_individual_query_tool(index_set, years):
    individual_query_engine_tools = [
        QueryEngineTool(
            query_engine=index_set[year].as_query_engine(),
            metadata=ToolMetadata(
                name=f"vector_index_{year}",
                description=f"useful for when you want to answer queries about the {year} SEC 10-K for Uber",
            ),
        )
        for year in years
    ]
    return individual_query_engine_tools

Synthesize Answers Across the Data

Create a function to synthesize questions for individual query engine tools:

def create_synthesizer(individual_query_engine_tool, service_context):
    query_engine = SubQuestionQueryEngine.from_defaults(
        query_engine_tools=individual_query_engine_tool,
        service_context=service_context,
    )
    return query_engine

Generate a Query Engine Tool for sub-question query engine:

def create_sub_question_tool(query_engine):
    query_engine_tool = QueryEngineTool(
        query_engine=query_engine,
        metadata=ToolMetadata(
            name="sub_question_query_engine",
            description="useful for when you want to answer queries that require analyzing multiple SEC 10-K documents for Uber",
        ),
    )
    return query_engine_tool

Create General Engine

The final engine we are going to create will serve as a query tool used to search for information that is not within the scope of the prepared data. Alternatively, it can be referred to as a chatbot for answering general questions.

def agent_chat():
    chat_engine_tool = [
        QueryEngineTool(
            query_engine=OpenAIAgent.from_tools([]),
            metadata=ToolMetadata(
                name="gpt_agent", description="Agent that can answer general questions."
            ),
        ),
    ]
    return chat_engine_tool

Create OpenAI Agent from Tools

Combine all query engine tools into an OpenAI Agent:

def build_chat_engine(individual_query_engine_tools, query_engine_tool, gpt_agent):
    global agent
    tools = gpt_agent + individual_query_engine_tools + [query_engine_tool]
    agent = OpenAIAgent.from_tools(tools, verbose=False)
    return agent

Create RAG API

Build an endpoint for loading data into VectorDB:

@app.post('/buildRAG')
def build_RAG():
    years = [2022, 2021, 2020, 2019]
    service_context = ServiceContext.from_defaults(chunk_size=512)
    doc_set = read_data(years)
    store_data(years, doc_set, service_context)

    return JSONResponse({'status': 'success'})

Create Chat Endpoint API

Implement an endpoint for processing chat queries:

@app.post('/chat')
def chat(query: str = 'What were some of the biggest risk factors in 2022 for Uber?'):
    global agent
    years = [2022, 2021, 2020, 2019]
    service_context = ServiceContext.from_defaults(chunk_size=512)
    index_set = load_data(years, service_context)
    individual_query_engine_tools = create_individual_query_tool(index_set, years)
    query_engine = create_synthesizer(individual_query_engine_tools, service_context)
    query_engine_tool = create_sub_question_tool(query_engine)
    gpt_agent = agent_chat()

    if agent is None:
        agent = build_chat_engine(individual_query_engine_tools, query_engine_tool, gpt_agent)
    
    answer = agent.chat(query)

    return JSONResponse({'answer': str(answer)})

Create Utility Endpoint

Implement two additional APIs for utility purposes:

  1. Reset the Agent data:

    @app.post('/resetChat')
    def resetChat():
        global agent
        agent.reset()
        return JSONResponse({'status': 'complete'})
  2. Use a chatbot without RAG:

    @app.post('/chatWithoutRAG')
    def chatWithoutRAG(query: str = 'What were some of the biggest risk factors in 2022 for Uber?'):
        gpt_agent = agent_chat()
        agent = OpenAIAgent.from_tools(gpt_agent, verbose=False)
        answer = agent.chat(query)
    
        return JSONResponse({'answer': str(answer)})

Deploying and Monitoring on EC2

To deploy FastAPI on an EC2 instance:

  1. Create a session using screen:

    screen -S name
  2. Navigate to the API folder:

    cd path/to/api
  3. Start FastAPI on port 8000:

    uvicorn app:app --host 0.0.0.0
  4. Detach from the screen session:

    Ctrl+a d

Now, the server runs in the background even after exiting the session.

Setting Up API Gateway and Implementing CORS

To integrate with AWS API Gateway for better management and usage control, add CORS configuration to FastAPI:

from fastapi.middleware.cors import CORSMiddleware

app.app = FastAPI()

app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)

Connect this server to AWS API Gateway to handle authentication and usage management.

Link to Backend Code

Last updated