Text-to-SQL

Use Case Overview

The Text-to-SQL Chatbot is designed for Business Analysts (BA) or Product Managers (PM) who need to query a PostgreSQL database without expertise in SQL. The frontend provides a user-friendly interface, allowing users to input queries in natural language and receive corresponding SQL queries and database results.

User Flow

  1. Connect to Database:

    • Users paste the Backend (BE) endpoint connected to the PostgreSQL database.

    • After clicking connect, the website displays the database schema and data.

  2. Write Queries:

    • Users type prompts or questions into the chat.

    • The website responds with the corresponding SQL queries, displaying them in the chat.

Frontend Development

Frontend Development with Next.js

The frontend is implemented using Next.js, with key libraries including react-hook-form, @tanstack/react-query, Tailwind CSS, and zod for schema validation.

create project

npx create-next-app
yarn dev

Form by zod

export const askScheme = z.object({
  endpoint: z.string().url().optional().or(z.literal('')),
  query: z.string(),
  bot: z.string({}).optional(),
})

react-hook-form

const methods = useForm <IOpenAIForm>({
    resolver: zodResolver(askScheme),
    mode: 'onChange',
    shouldFocusError: true,
    defaultValues: {
      endpoint: endpointAPi,
    },
  })

  const {
    handleSubmit,
    setError,
    setValue,
    control,
    formState: { errors },
  } = methods

The integration involves connecting to the backend API using axios and @tanstack/react-query. Data, including column information and random data for the table, is fetched from the backend, and the UI is updated accordingly.

const [
    { isLoading, data: column, refetch: refetchColum },
    { isLoading: isLoadAlldata, data: allData, refetch: refetchAllData },
  ] = useQueries({
    queries: [
      {
        queryKey: ['getInfo'],
        queryFn: () => axios.get('/getInfo').then((res) => res.data),
      },
      {
        queryKey: ['getAllData'],
        queryFn: () => axios.get('/getAllData').then((res) => res.data),
      },
    ],
  })
const { mutateAsync: queryPromt } = useMutation({
    mutationFn: ({ query_str }: { query_str: string }) => {
      return axios.get(`/queryWithPrompt?query_str=${query_str}`)
    },
    onError: () => {
      setError('bot', {
        message: 'There was an error fetching the response.',
      })
    },
  })

  const { mutateAsync: randomData } = useMutation({
    mutationFn: () => {
      return axios.get(`/addRandomData`)
    },
  })

Deploying the Frontend

To deploy the frontend, configure environment variables such as the API endpoint in the .env file. The tutorial provides guidance on deploying using Next.js.

NEXT_PUBLIC_API = your endpoint

Link to Frontend Code

Backend Development

Setting up AWS EC2 Instance

This section guides you through setting up an AWS EC2 instance for database and backend API installation. The recommended EC2 type is one with sufficient RAM, such as m5.large, c5.large, or r5.large, along with a minimum of 50GB storage. The chosen operating system is Ubuntu.

Installing and Configuring PostgreSQL on EC2

The tutorial suggests using Docker for easy installation of PostgreSQL. It provides step-by-step commands for installing Docker on the EC2 instance and running a PostgreSQL container.

# Add Docker's official GPG key:
sudo apt-get update
sudo apt-get install ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
sudo chmod a+r /etc/apt/keyrings/docker.gpg

# Add the repository to Apt sources:
echo \
  "deb [arch="$(dpkg --print-architecture)" signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
  "$(. /etc/os-release && echo "$VERSION_CODENAME")" stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

install Postgres by Docker

docker run --name llm-postgres -e POSTGRES_PASSWORD=mysecretpassword -p 5432:5432 -d postgres

Setting up FastAPI Backend

Initial FastAPI

FastAPI is chosen for creating the backend API. The tutorial introduces the basic structure of a FastAPI project and demonstrates enabling CORS for frontend communication.

We will use FastAPI to create an API. Llamaindex serves as the intermediary for handling data models and the LLM model, which is GPT-4. Therefore, it is necessary to create an API key for OpenAI before using it, which can be done at https://platform.openai.com/.

necessary python library for FastAPI

pip install fastapi
pip install "uvicorn[standard]"

create app.py and initial FastAPI

from fastapi import FastAPI
from fastapi.encoders import jsonable_encoder
from fastapi.responses import JSONResponse
from fastapi.middleware.cors import CORSMiddleware
app = FastAPI()

app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)


@app.get("/helloworld")
async def helloworld():
    return {"message": "Hello World"}

Initial PostgreSQL Connection

The backend connects to PostgreSQL using SQLAlchemy and psycopg2. Environment variables are loaded using python-dotenv, and key parameters for connecting to the database are declared.

Install necessary libraries using pip

pip install sqlalchemy psycopg2-binary faker numpy

Declare parameters for creating a database connection, pulling data from environment variables

import os
import dotenv

dotenv.load_dotenv()
HOST = os.environ['HOST']
DBPASSWORD = os.environ['DBPASSWORD']
DBUSER = os.environ['DBUSER']
DBNAME = os.environ['DBNAME']

Use the provided GitHub example to set up an API for preparing tables and data related to student information, including height and weight.

The GitHub example includes the following API endpoints:

  • /createTable: Create the student table.

  • /getInfo: Check information about the table.

  • /addRandomData: Add sample data to the table.

  • /getAllData: Retrieve all data from the table.

Optionally, the example includes an endpoint for removing the created table:

  • /removeTable: Delete the table created.

Integrating LlamaIndex and OpenAI

LlamaIndex is used for data modeling, acting as an intermediary between the database and the LLM (GenerativeAI) model. The tutorial covers the installation of LlamaIndex and the setup of OpenAI API keys.

Install Llamaindex Library

pip install llama_index

Declare OpenAI API Key as Environment Variable

OPENAI_API_KEY=your_key

Deploying and Testing the Backend

The backend server is deployed, and testing is performed using various API endpoints. The tutorial shows how to create tables, add random data, and query with prompts to retrieve results from the database.

Deployment Steps:

  1. Create a Screen Session

    screen -S name
  2. Navigate to the API Folder

    cd path/to/api
  3. Start FastAPI

    uvicorn app:app
  4. Detach from Screen Session

    Ctrl+a d

Testing Endpoints:

  1. Test /createTable Endpoint:

    • Response:

      {
          "message": "complete"
      }
  2. Test /addRandomData Endpoint:

    • Response:

      [
          {
              "id": 279703,
              "name": "Christina",
              "lastname": "Santos",
              "height": 201.53,
              "weight": 68.07
          },
          // ... (other data entries)
      ]
  3. Test /queryWithPrompt Endpoint:

    • Prompt:

      "Write SQL in PostgreSQL format. Get average height of students."
    • Response:

      {
          "result": "The average height of students is approximately 176.56 cm.",
          "SQL Query": "SELECT AVG(height) FROM students;"
      }

Setting up AWS API Gateway

The final section demonstrates integrating the deployed API with AWS API Gateway for enhanced management capabilities, such as authentication. For more information: Link

Link to Backend Code

Last updated