Create AGENT AGENT AFTER YOUR DATA CATHOLOG CATHOLOBER IN NATIONAL LANGUAGE

nimda June 16, 2025

0 13 5 minutes read

Create AGENT AGENT AFTER YOUR DATA CATHOLOG CATHOLOBER IN NATIONAL LANGUAGE

In all data operations conducted with data, product, or dashboard are lying in one part: details. These programs create the final basis, manage, and formal questions

Communicate with these programs, depending on the SQL (a systematic question), systematic and powerful way of return, deception, and re-analyze data. The SQL sounds, accurate, and designed to work. However to many users – especially those younger in the data – SQL can scare. Remembering the syntax, understanding, and complex Schems can be a barrier to the production.

But the vision of information is made up of questions in nature languages! In fact, research on environmental language areas in the Databases (NLIDBS) days returning in 1970. Projects that are like Lunar and It is good to check how users can ask questions in clear English and receive formal answers given by SQL. Despite the interest of great education, these early programs were measured in general, stability and disability. Back in 2029, Powerbi showed us in the first form of natural language in 2019. While the Q & A feature promised, we had complicated questions, and it depends on the cleaning of the data model. Finally, there was no kind of consultation along with the circumstances that conformed to the true assistant.

But what about 2025? Do we know that technology has made it possible?

Would the llms do now what we can do before?

Based on what we know about the llms and their skills, we understand that they and the concept of AI are equipped separately to close the gap between SQL and human humbers. They are very good in translating unclear questions, producing the appropriate Signactically, and adapting to different user's user tests. This makes them ready for interviews to the data. However, llms do not decide; they rely heavily on the profealth entry, which can lead to reducing, negative thinking or

This is when Ai's agents attend. By rolling the llm within a systematic program – which includes memory, tools, verification layers, and specified purposes – can reduce the lower outcome. The agent becomes more than just a text generator: becomes a collaborative units that understand the environment. Mixed with appropriate access strategies, and the availability of user purpose, as well as agents that allow us to build more reliable systems than the renewed set.

And that's the foundation of this short lesson: How you can create your first AI agent helper to ask your data catalog!

Step Guide for Step to Make Data Catalog of Data

First and very important, we need to choose our Tech stack. We will need a model provider, a tool to help us use the building in our agent's travel, connectors, and convert to the discussion experience!

Open (GPT-4): The highest class understanding of the environment, consultation, and SQL Generation.
Pydantic aI: Adds shape of the llm answers. No answers of halusutions or subtle answers – are just clean, out of schema.
Support: Soon the response interface has responded to the built-in LLM form and the answer.
Databricks SQL connection: Reach your Databricks Workpace Catalog, Schema, and questions in real time.

Also, let's forget – this is a little, simple project. If you plan to send to production, across several information, you are definitely you need to think about other concerns: Firm, user's creation, user's feeling, data privacy: and list progresses.

1. Environmental setup

Before we enter a coding, let us prepare our development environment. This step ensures that all required packages are included and separated from a clean environment. This avoids translation conflict and keeps our project organized.

conda create -n sql-agent python=3.12
conda activate sql-agent

pip install pydantic-ai openai streamlit databricks-sql-connector

2. Create tools and logic to access data data catalog

While constructed a flexible SQL agent may appear as a LLM problem, actually a Data Problem For the first time. You need Metadata, a column center, the issues, and the Profile layer to know what is safe to ask and how to translate the results. This is part of what we call Data-Centric Ai Stack Stack (May be very loud in 2021 but I promise that it is still very relevant !!) – Where profing is included, quality, and schema confirmation of before Soon engineering.

In this case, and because the agent needs the context in order to reason on your details, this action includes setup in your Dataadata space will act as a basis for effective SQL questions.

def set_connection(server_hostname: str, http_path: str, access_token: str):
    connection = sql.connect(
        server_hostname=server_hostname,
        http_path=http_path,
        access_token=access_token
    )
    return connection

The full Metadata Connector Code can be found here.

3. Build a SQL agent for Pydantic Ai

Here's how we described our Ai agent. We use pydantic-ai To enforce formal results, this time, we want to make sure that it will always get a clean SQL question from the LLM. This makes the agent safe to use apps and reduce the invisible and more important, unpleasant code.

Defining the agent, starting by specifying the Gydantic-output schema, this time, one field code representing the SQL question. Then, we use Agent The installation class together quickly, the name of the model, and the type of output.

from pydantic import BaseModel
from pydantic_ai.agent import Agent
from pydantic_ai.messages import ModelResponse, TextPart

# ==== Output schema ====
class CatalogQuery(BaseModel):
    code: str

# ==== Agent Factory ====
def catalog_metadata_agent(system_prompt: str, model: str="openai:gpt-4o") -> Agent:
    return Agent(
        model=model,
        system_prompt=system_prompt,
        output_type=CatalogQuery,
        instrument=True
    )

# ==== Response Adapter ====
def to_model_response(output: CatalogQuery, timestamp: str) -> ModelResponse:
    return ModelResponse(
        parts=[TextPart(f"```sqln{output.code}n```")],
        timestamp=timestamp
    )

System Prompt provides examples and examples to guide the operation of the llm, while instrument=True It enables sequence and recognition of correcting error or examination.

The developing system itself is designed to direct the agent's behavior. It is clearly implies the purpose of the Union, including the Metadata contexts to find their thoughts, and provide concrete examples to show the expected output format. This structure helps the llM model to remain focused, reducing the ambiguity, and is also visible, valid.

4. Create a chat interface

Now that we have the bases of our SQL agent is the time to make it partners. The Levering StreaLit is now to build simple front-fetch where we can ask the natural language questions and accept the SQL questions produced in real time.

Fortunately, the guidance that just gives us strong construction blocks to create a llM chat experiences. If you want to know, here is the best tutorial that travels throughout the process in detail.

Screenshot is a writer – the databicks SQL agent Chat with Openai and Guidelines

You can find the full code for this lesson here and you can try the app on Streamlit Community Cloud.

The last thoughts

In this lesson, you learned to travel in the first manufacturing of a simple AI agent. Focusingness was a non-luxury prototype to help you understand how to distribute the agency of AI.

However, if you take this productive, here are a few things to consider:

HALLucinations are real, and you can't be sure that Ret SQL is SQL is correct. SQL SQL SQL analysis to ensure the issuing and use the Repry Mechanism, especially purposes;
Schema tools are realizing to sanity-look in the screen names and columns.
Add the fall of fall when the question failed – eg, “understand this table instead?”
Make it a state
All infrastructure equipment, point to storage, and system operation.

At the end of the day, what makes these programs work not only model, is the writing That helps. Clean Metadata, good reflections, and verification everything is part of the data quality that turns the meeting place designed to be honest agents.

Source link

nimda June 16, 2025

0 13 5 minutes read