Natural Language Postgres Guide
In this guide, you will learn how to build an app that uses AI to interact with a PostgreSQL database using natural language.
The application will:
- Generate SQL queries from a natural language input
- Explain query components in plain English
- Create a chart to visualise query results
You can find a completed version of this project at natural-language-postgres.vercel.app.
Project setup
This project uses the following stack:
- Next.js (App Router)
- AI SDK
- OpenAI
- Zod
- Postgres with Vercel Postgres
- shadcn-ui and TailwindCSS for styling
- Recharts for data visualization
Clone repo
To focus on the AI-powered functionality rather than project setup and configuration we've prepared a starter repository which includes a database schema and a few components.
Clone the starter repository and check out the starter
branch:
git clone https://github.com/vercel-labs/natural-language-postgres
cd natural-language-postgres
git checkout starter
Project setup and data
Let's set up the project and seed the database with the dataset:
- Install dependencies:
pnpm install
- Copy the example environment variables file:
cp .env.example .env
- Add your environment variables to
.env
:
OPENAI_API_KEY="your_api_key_here"POSTGRES_URL="..."POSTGRES_PRISMA_URL="..."POSTGRES_URL_NO_SSL="..."POSTGRES_URL_NON_POOLING="..."POSTGRES_USER="..."POSTGRES_HOST="..."POSTGRES_PASSWORD="..."POSTGRES_DATABASE="..."
This project uses Vercel Postgres. You can learn more about how to set up at the Vercel Postgres documentation.
- This project uses CB Insights' Unicorn Companies dataset. You can download the dataset by following these instructions:
- Navigate to CB Insights Unicorn Companies
- Enter in your email. You will receive a link to download the dataset.
- Save it as
unicorns.csv
in your project root
About the dataset
The Unicorn List dataset contains the following information about unicorn startups (companies with a valuation above $1bn):
- Company name
- Valuation
- Date joined (unicorn status)
- Country
- City
- Industry
- Select investors
This dataset contains over 1000 rows of data over 7 columns, giving us plenty of structured data to analyze. This makes it perfect for exploring various SQL queries that can reveal interesting insights about the unicorn startup ecosystem.
- Now that you have the dataset downloaded and added to your project, you can initialize the database with the following command:
pnpm run seed
Note: this step can take a little while. You should see a message indicating the Unicorns table has been created and then that the database has been seeded successfully.
Remember, the dataset should be named unicorns.csv
and located in root of
your project.
- Start the development server:
pnpm run dev
Your application should now be running at http://localhost:3000.
Project structure
The starter repository already includes everything that you will need, including:
- Database seed script (
lib/seed.ts
) - Basic components built with shadcn/ui (
components/
) - Function to run SQL queries (
app/actions.ts
) - Type definitions for the database schema (
lib/types.ts
)
Existing components
The application contains a single page in app/page.tsx
that serves as the main interface.
At the top, you'll find a header (header.tsx
) displaying the application title and description. Below that is an input field and search button (search.tsx
) where you can enter natural language queries.
Initially, the page shows a collection of suggested example queries (suggested-queries.tsx
) that you can click to quickly try out the functionality.
When you submit a query:
- The suggested queries section disappears and a loading state appears
- Once complete, a card appears with "TODO - IMPLEMENT ABOVE" (
query-viewer.tsx
) which will eventually show your generated SQL - Below that is an empty results area with "No results found" (
results.tsx
)
After you implement the core functionality:
- The results section will display data in a table format
- A toggle button will allow switching between table and chart views
- The chart view will visualize your query results
Let's implement the AI-powered functionality to bring it all together.
Building the application
As a reminder, this application will have three main features:
- Generate SQL queries from natural language
- Create a chart from the query results
- Explain SQL queries in plain English
For each of these features, you'll use the AI SDK via Server Actions to interact with OpenAI's GPT-4o and GPT-4o-mini models. Server Actions are a powerful React Server Component feature that allows you to call server-side functions directly from your frontend code.
Let's start with generating a SQL query from natural language.
Generate SQL queries
Providing context
For the model to generate accurate SQL queries, it needs context about your database schema, tables, and relationships. You will communicate this information through a prompt that should include:
- Schema information
- Example data formats
- Available SQL operations
- Best practices for query structure
- Nuanced advice for specific fields
Let's write a prompt that includes all of this information:
You are a SQL (postgres) and data visualization expert. Your job is to help the user write a SQL query to retrieve the data they need. The table schema is as follows:
unicorns ( id SERIAL PRIMARY KEY, company VARCHAR(255) NOT NULL UNIQUE, valuation DECIMAL(10, 2) NOT NULL, date_joined DATE, country VARCHAR(255) NOT NULL, city VARCHAR(255) NOT NULL, industry VARCHAR(255) NOT NULL, select_investors TEXT NOT NULL);
Only retrieval queries are allowed.
For things like industry, company names and other string fields, use the ILIKE operator and convert both the search term and the field to lowercase using LOWER() function. For example: LOWER(industry) ILIKE LOWER('%search_term%').
Note: select_investors is a comma-separated list of investors. Trim whitespace to ensure you're grouping properly. Note, some fields may be null or have only one value.When answering questions about a specific field, ensure you are selecting the identifying column (ie. what is Vercel's valuation would select company and valuation').
The industries available are:- healthcare & life sciences- consumer & retail- financial services- enterprise tech- insurance- media & entertainment- industrials- health
If the user asks for a category that is not in the list, infer based on the list above.
Note: valuation is in billions of dollars so 10b would be 10.0.Note: if the user asks for a rate, return it as a decimal. For example, 0.1 would be 10%.
If the user asks for 'over time' data, return by year.
When searching for UK or USA, write out United Kingdom or United States respectively.
EVERY QUERY SHOULD RETURN QUANTITATIVE DATA THAT CAN BE PLOTTED ON A CHART! There should always be at least two columns. If the user asks for a single column, return the column and the count of the column. If the user asks for a rate, return the rate as a decimal. For example, 0.1 would be 10%.
There are several important elements of this prompt:
- Schema description helps the model understand exactly what data fields to work with
- Includes rules for handling queries based on common SQL patterns - for example, always using ILIKE for case-insensitive string matching
- Explains how to handle edge cases in the dataset, like dealing with the comma-separated investors field and ensuring whitespace is properly handled
- Instead of having the model guess at industry categories, it provides the exact list that exists in the data, helping avoid mismatches
- The prompt helps standardize data transformations - like knowing to interpret "10b" as "10.0" billion dollars, or that rates should be decimal values
- Clear rules ensure the query output will be chart-friendly by always including at least two columns of data that can be plotted
This prompt structure provides a strong foundation for query generation, but you should experiment and iterate based on your specific needs and the model you're using.
Create a Server Action
With the prompt done, let's create a Server Action.
Open app/actions.ts
. You should see one action already defined (runGeneratedSQLQuery
).
Add a new action. This action should be asynchronous and take in one parameter - the natural language query.
/* ...rest of the file... */
export const generateQuery = async (input: string) => {};
In this action, you'll use the generateObject
function from the AI SDK which allows you to constrain the model's output to a pre-defined schema. This process, sometimes called structured output, ensures the model returns only the SQL query without any additional prefixes, explanations, or formatting that would require manual parsing.
/* ...other imports... */import { generateObject } from 'ai';import { openai } from '@ai-sdk/openai';import { z } from 'zod';
/* ...rest of the file... */
export const generateQuery = async (input: string) => { 'use server'; try { const result = await generateObject({ model: openai('gpt-4o'), system: `You are a SQL (postgres) ...`, // SYSTEM PROMPT AS ABOVE - OMITTED FOR BREVITY prompt: `Generate the query necessary to retrieve the data the user wants: ${input}`, schema: z.object({ query: z.string(), }), }); return result.object.query; } catch (e) { console.error(e); throw new Error('Failed to generate query'); }};
Note, you are constraining the output to a single string field called query
using zod
, a TypeScript schema validation library. This will ensure the model only returns the SQL query itself. The resulting generated query will then be returned.
Update the frontend
With the Server Action in place, you can now update the frontend to call this action when the user submits a natural language query. In the root page (app/page.tsx
), you should see a handleSubmit
function that is called when the user submits a query.
Import the generateQuery
function and call it with the user's input.
/* ...other imports... */import { runGeneratedSQLQuery, generateQuery } from './actions';
/* ...rest of the file... */
const handleSubmit = async (suggestion?: string) => { clearExistingData();
const question = suggestion ?? inputValue; if (inputValue.length === 0 && !suggestion) return;
if (question.trim()) { setSubmitted(true); }
setLoading(true); setLoadingStep(1); setActiveQuery('');
try { const query = await generateQuery(question);
if (query === undefined) { toast.error('An error occurred. Please try again.'); setLoading(false); return; }
setActiveQuery(query); setLoadingStep(2);
const companies = await runGeneratedSQLQuery(query); const columns = companies.length > 0 ? Object.keys(companies[0]) : []; setResults(companies); setColumns(columns);
setLoading(false); } catch (e) { toast.error('An error occurred. Please try again.'); setLoading(false); }};
/* ...rest of the file... */
Now, when the user submits a natural language query (ie. "how many unicorns are from San Francisco?"), that question will be sent to your newly created Server Action. The Server Action will call the model, passing in your system prompt and the users query, and return the generated SQL query in a structured format. This query is then passed to the runGeneratedSQLQuery
action to run the query against your database. The results are then saved in local state and displayed to the user.
Save the file, make sure the dev server is running, and then head to localhost:3000
in your browser. Try submitting a natural language query and see the generated SQL query and results. You should see a SQL query generated and displayed under the input field. You should also see the results of the query displayed in a table below the input field.
Try clicking the SQL query to see the full query if it's too long to display in the input field. You should see a button on the right side of the input field with a question mark icon. Clicking this button currently does nothing, but you'll add the "explain query" functionality to it in the next step.
Explain SQL Queries
Next, let's add the ability to explain SQL queries in plain English. This feature helps users understand how the generated SQL query works by breaking it down into logical sections. As with the SQL query generation, you'll need a prompt to guide the model when explaining queries.
Let's craft a prompt for the explain query functionality:
You are a SQL (postgres) expert. Your job is to explain to the user write a SQL query you wrote to retrieve the data they asked for. The table schema is as follows:unicorns ( id SERIAL PRIMARY KEY, company VARCHAR(255) NOT NULL UNIQUE, valuation DECIMAL(10, 2) NOT NULL, date_joined DATE, country VARCHAR(255) NOT NULL, city VARCHAR(255) NOT NULL, industry VARCHAR(255) NOT NULL, select_investors TEXT NOT NULL);
When you explain you must take a section of the query, and then explain it. Each "section" should be unique. So in a query like: "SELECT * FROM unicorns limit 20", the sections could be "SELECT *", "FROM UNICORNS", "LIMIT 20".If a section doesnt have any explanation, include it, but leave the explanation empty.
Like the prompt for generating SQL queries, you provide the model with the schema of the database. Additionally, you provide an example of what each section of the query might look like. This helps the model understand the structure of the query and how to break it down into logical sections.
Create a Server Action
Add a new Server Action to generate explanations for SQL queries.
This action takes two parameters - the original natural language input and the generated SQL query.
/* ...rest of the file... */
export const explainQuery = async (input: string, sqlQuery: string) => { 'use server'; try { const result = await generateObject({ model: openai('gpt-4o'), system: `You are a SQL (postgres) expert. ...`, // SYSTEM PROMPT AS ABOVE - OMITTED FOR BREVITY prompt: `Explain the SQL query you generated to retrieve the data the user wanted. Assume the user is not an expert in SQL. Break down the query into steps. Be concise.
User Query: ${input}
Generated SQL Query: ${sqlQuery}`, }); return result.object; } catch (e) { console.error(e); throw new Error('Failed to generate query'); }};
This action uses the generateObject
function again. However, you haven't defined the schema yet. Let's define it in another file so it can also be used as a type in your components.
Update your lib/types.ts
file to include the schema for the explanations:
import { z } from 'zod';
/* ...rest of the file... */
export const explanationSchema = z.object({ section: z.string(), explanation: z.string(),});
export type QueryExplanation = z.infer<typeof explanationSchema>;
This schema defines the structure of the explanation that the model will generate. Each explanation will have a section
and an explanation
. The section
is the part of the query being explained, and the explanation
is the plain English explanation of that section. Go back to your actions.ts
file and import and use the explanationSchema
:
// other importsimport { explanationSchema } from '@/lib/types';
/* ...rest of the file... */
export const explainQuery = async (input: string, sqlQuery: string) => { 'use server'; try { const result = await generateObject({ model: openai('gpt-4o'), system: `You are a SQL (postgres) expert. ...`, // SYSTEM PROMPT AS ABOVE - OMITTED FOR BREVITY prompt: `Explain the SQL query you generated to retrieve the data the user wanted. Assume the user is not an expert in SQL. Break down the query into steps. Be concise.
User Query: ${input}
Generated SQL Query: ${sqlQuery}`, schema: explanationSchema, output: 'array', }); return result.object; } catch (e) { console.error(e); throw new Error('Failed to generate query'); }};
You can use output: "array"
to indicate to the model that you expect an
array of objects matching the schema to be returned.
Update query viewer
Next, update the query-viewer.tsx
component to display these explanations. The handleExplainQuery
function is called every time the user clicks the question icon button on the right side of the query. Let's update this function to use the new explainQuery
action:
/* ...other imports... */import { explainQuery } from '@/app/actions';
/* ...rest of the component... */
const handleExplainQuery = async () => { setQueryExpanded(true); setLoadingExplanation(true);
const explanations = await explainQuery(inputValue, activeQuery); setQueryExplanations(explanations);
setLoadingExplanation(false);};
/* ...rest of the component... */
Now when users click the explanation button (the question mark icon), the component will:
- Show a loading state
- Send the active SQL query and the users natural language query to your Server Action
- The model will generate an array of explanations
- The explanations will be set in the component state and rendered in the UI
Submit a new query and then click the explanation button. Hover over different elements of the query. You should see the explanations for each section!
Visualizing query results
Finally, let's render the query results visually in a chart. There are two approaches you could take:
-
Send both the query and data to the model and ask it to return the data in a visualization-ready format. While this provides complete control over the visualization, it requires the model to send back all of the data, which significantly increases latency and costs.
-
Send the query and data to the model and ask it to generate a chart configuration (fixed-size and not many tokens) that maps your data appropriately. This configuration specifies how to visualize the information while delivering the insights from your natural language query. Importnatly, this is done without requiring the model return the full dataset.
Since you don't know the SQL query or data shape beforehand, let's use the second approach to dynamically generate chart configurations based on the query results and user intent.
Generate the chart configuration
For this feature, you'll create a Server Action that takes the query results and the user's original natural language query to determine the best visualization approach. Your application is already set up to use shadcn
charts (which uses Recharts
under the hood) so the model will need to generate:
- Chart type (bar, line, area, or pie)
- Axis mappings
- Visual styling
Let's start by defining the schema for the chart configuration in lib/types.ts
:
/* ...rest of the file... */
export const configSchema = z .object({ description: z .string() .describe( 'Describe the chart. What is it showing? What is interesting about the way the data is displayed?', ), takeaway: z.string().describe('What is the main takeaway from the chart?'), type: z.enum(['bar', 'line', 'area', 'pie']).describe('Type of chart'), title: z.string(), xKey: z.string().describe('Key for x-axis or category'), yKeys: z .array(z.string()) .describe( 'Key(s) for y-axis values this is typically the quantitative column', ), multipleLines: z .boolean() .describe( 'For line charts only: whether the chart is comparing groups of data.', ) .optional(), measurementColumn: z .string() .describe( 'For line charts only: key for quantitative y-axis column to measure against (eg. values, counts etc.)', ) .optional(), lineCategories: z .array(z.string()) .describe( 'For line charts only: Categories used to compare different lines or data series. Each category represents a distinct line in the chart.', ) .optional(), colors: z .record( z.string().describe('Any of the yKeys'), z.string().describe('Color value in CSS format (e.g., hex, rgb, hsl)'), ) .describe('Mapping of data keys to color values for chart elements') .optional(), legend: z.boolean().describe('Whether to show legend'), }) .describe('Chart configuration object');
export type Config = z.infer<typeof configSchema>;
Replace the existing export type Config = any;
type with the new one.
This schema makes extensive use of Zod's .describe()
function to give the model extra context about each of the key's you are expecting in the chart configuration. This will help the model understand the purpose of each key and generate more accurate results.
Another important technique to note here is that you are defining description
and takeaway
fields. Not only are these useful for the user to quickly understand what the chart means and what they should take away from it, but they also force the model to generate a description of the data first, before it attempts to generate configuration attributes like axis and columns. This will help the model generate more accurate and relevant chart configurations.
Create the Server Action
Create a new action in app/actions.ts
:
/* ...other imports... */import { Config, configSchema, explanationsSchema, Result } from '@/lib/types';
/* ...rest of the file... */
export const generateChartConfig = async ( results: Result[], userQuery: string,) => { 'use server';
try { const { object: config } = await generateObject({ model: openai('gpt-4o'), system: 'You are a data visualization expert.', prompt: `Given the following data from a SQL query result, generate the chart config that best visualises the data and answers the users query. For multiple groups use multi-lines.
Here is an example complete config: export const chartConfig = { type: "pie", xKey: "month", yKeys: ["sales", "profit", "expenses"], colors: { sales: "#4CAF50", // Green for sales profit: "#2196F3", // Blue for profit expenses: "#F44336" // Red for expenses }, legend: true }
User Query: ${userQuery}
Data: ${JSON.stringify(results, null, 2)}`, schema: configSchema, });
// Override with shadcn theme colors const colors: Record<string, string> = {}; config.yKeys.forEach((key, index) => { colors[key] = `hsl(var(--chart-${index + 1}))`; });
const updatedConfig = { ...config, colors }; return { config: updatedConfig }; } catch (e) { console.error(e); throw new Error('Failed to generate chart suggestion'); }};
Update the chart component
With the action in place, you'll want to trigger it automatically after receiving query results. This ensures the visualization appears almost immediately after data loads.
Update the handleSubmit
function in your root page (app/page.tsx
) to generate and set the chart configuration after running the query:
/* ...other imports... */import { getCompanies, generateQuery, generateChartConfig } from './actions';
/* ...rest of the file... */const handleSubmit = async (suggestion?: string) => { clearExistingData();
const question = suggestion ?? inputValue; if (inputValue.length === 0 && !suggestion) return;
if (question.trim()) { setSubmitted(true); }
setLoading(true); setLoadingStep(1); setActiveQuery('');
try { const query = await generateQuery(question);
if (query === undefined) { toast.error('An error occurred. Please try again.'); setLoading(false); return; }
setActiveQuery(query); setLoadingStep(2);
const companies = await runGeneratedSQLQuery(query); const columns = companies.length > 0 ? Object.keys(companies[0]) : []; setResults(companies); setColumns(columns);
setLoading(false);
const { config } = await generateChartConfig(companies, question); setChartConfig(config); } catch (e) { toast.error('An error occurred. Please try again.'); setLoading(false); }};
/* ...rest of the file... */
Now when users submit queries, the application will:
- Generate and run the SQL query
- Display the table results
- Generate a chart configuration for the results
- Allow toggling between table and chart views
Head back to the browser and test the application with a few queries. You should see the chart visualization appear after the table results.
Next steps
You've built an AI-powered SQL analysis tool that can convert natural language to SQL queries, visualize query results, and explain SQL queries in plain English.
You could, for example, extend the application to use your own data sources or add more advanced features like customizing the chart configuration schema to support more chart types and options. You could also add more complex SQL query generation capabilities.