Text-to-SQL
Last updated
Last updated
The Text-to-SQL Chatbot is designed for Business Analysts (BA) or Product Managers (PM) who need to query a PostgreSQL database without expertise in SQL. The frontend provides a user-friendly interface, allowing users to input queries in natural language and receive corresponding SQL queries and database results.
Connect to Database:
Users paste the Backend (BE) endpoint connected to the PostgreSQL database.
After clicking connect, the website displays the database schema and data.
Write Queries:
Users type prompts or questions into the chat.
The website responds with the corresponding SQL queries, displaying them in the chat.
The frontend is implemented using Next.js, with key libraries including react-hook-form
, @tanstack/react-query
, Tailwind CSS
, and zod
for schema validation.
create project
npx create-next-app
yarn dev
Form by zod
export const askScheme = z.object({
endpoint: z.string().url().optional().or(z.literal('')),
query: z.string(),
bot: z.string({}).optional(),
})
react-hook-form
const methods = useForm <IOpenAIForm>({
resolver: zodResolver(askScheme),
mode: 'onChange',
shouldFocusError: true,
defaultValues: {
endpoint: endpointAPi,
},
})
const {
handleSubmit,
setError,
setValue,
control,
formState: { errors },
} = methods
The integration involves connecting to the backend API using axios
and @tanstack/react-query
. Data, including column information and random data for the table, is fetched from the backend, and the UI is updated accordingly.
const [
{ isLoading, data: column, refetch: refetchColum },
{ isLoading: isLoadAlldata, data: allData, refetch: refetchAllData },
] = useQueries({
queries: [
{
queryKey: ['getInfo'],
queryFn: () => axios.get('/getInfo').then((res) => res.data),
},
{
queryKey: ['getAllData'],
queryFn: () => axios.get('/getAllData').then((res) => res.data),
},
],
})
const { mutateAsync: queryPromt } = useMutation({
mutationFn: ({ query_str }: { query_str: string }) => {
return axios.get(`/queryWithPrompt?query_str=${query_str}`)
},
onError: () => {
setError('bot', {
message: 'There was an error fetching the response.',
})
},
})
const { mutateAsync: randomData } = useMutation({
mutationFn: () => {
return axios.get(`/addRandomData`)
},
})
To deploy the frontend, configure environment variables such as the API endpoint in the .env
file. The tutorial provides guidance on deploying using Next.js.
NEXT_PUBLIC_API = your endpoint
This section guides you through setting up an AWS EC2 instance for database and backend API installation. The recommended EC2 type is one with sufficient RAM, such as m5.large
, c5.large
, or r5.large
, along with a minimum of 50GB storage. The chosen operating system is Ubuntu.
The tutorial suggests using Docker for easy installation of PostgreSQL. It provides step-by-step commands for installing Docker on the EC2 instance and running a PostgreSQL container.
# Add Docker's official GPG key:
sudo apt-get update
sudo apt-get install ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
sudo chmod a+r /etc/apt/keyrings/docker.gpg
# Add the repository to Apt sources:
echo \
"deb [arch="$(dpkg --print-architecture)" signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
"$(. /etc/os-release && echo "$VERSION_CODENAME")" stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
install Postgres by Docker
docker run --name llm-postgres -e POSTGRES_PASSWORD=mysecretpassword -p 5432:5432 -d postgres
Initial FastAPI
FastAPI is chosen for creating the backend API. The tutorial introduces the basic structure of a FastAPI project and demonstrates enabling CORS for frontend communication.
We will use FastAPI to create an API. Llamaindex serves as the intermediary for handling data models and the LLM model, which is GPT-4. Therefore, it is necessary to create an API key for OpenAI before using it, which can be done at https://platform.openai.com/.
necessary python library for FastAPI
pip install fastapi
pip install "uvicorn[standard]"
create app.py and initial FastAPI
from fastapi import FastAPI
from fastapi.encoders import jsonable_encoder
from fastapi.responses import JSONResponse
from fastapi.middleware.cors import CORSMiddleware
app = FastAPI()
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
@app.get("/helloworld")
async def helloworld():
return {"message": "Hello World"}
Initial PostgreSQL Connection
The backend connects to PostgreSQL using SQLAlchemy and psycopg2. Environment variables are loaded using python-dotenv
, and key parameters for connecting to the database are declared.
Install necessary libraries using pip
pip install sqlalchemy psycopg2-binary faker numpy
Declare parameters for creating a database connection, pulling data from environment variables
import os
import dotenv
dotenv.load_dotenv()
HOST = os.environ['HOST']
DBPASSWORD = os.environ['DBPASSWORD']
DBUSER = os.environ['DBUSER']
DBNAME = os.environ['DBNAME']
Use the provided GitHub example to set up an API for preparing tables and data related to student information, including height and weight.
The GitHub example includes the following API endpoints:
/createTable
: Create the student table.
/getInfo
: Check information about the table.
/addRandomData
: Add sample data to the table.
/getAllData
: Retrieve all data from the table.
Optionally, the example includes an endpoint for removing the created table:
/removeTable
: Delete the table created.
Integrating LlamaIndex and OpenAI
LlamaIndex is used for data modeling, acting as an intermediary between the database and the LLM (GenerativeAI) model. The tutorial covers the installation of LlamaIndex and the setup of OpenAI API keys.
Install Llamaindex Library
pip install llama_index
Declare OpenAI API Key as Environment Variable
OPENAI_API_KEY=your_key
The backend server is deployed, and testing is performed using various API endpoints. The tutorial shows how to create tables, add random data, and query with prompts to retrieve results from the database.
Create a Screen Session
screen -S name
Navigate to the API Folder
cd path/to/api
Start FastAPI
uvicorn app:app
Detach from Screen Session
Ctrl+a d
Test /createTable
Endpoint:
Response:
{
"message": "complete"
}
Test /addRandomData
Endpoint:
Response:
[
{
"id": 279703,
"name": "Christina",
"lastname": "Santos",
"height": 201.53,
"weight": 68.07
},
// ... (other data entries)
]
Test /queryWithPrompt
Endpoint:
Prompt:
"Write SQL in PostgreSQL format. Get average height of students."
Response:
{
"result": "The average height of students is approximately 176.56 cm.",
"SQL Query": "SELECT AVG(height) FROM students;"
}
The final section demonstrates integrating the deployed API with AWS API Gateway for enhanced management capabilities, such as authentication. For more information: Link