Excel file dictionary using Openpyxl and Ai Agents

Every company I worked for until today, was: DIDEENT MS Excel.
Excel was first issued in 1985 and has remained strong till today. We have survived the upgrade of communication information, the appearance of many planning languages, the Internet with a limit to its unlimited application number, and ultimately, and survive during the AI period.
PEW!
Do you have any doubts about how strong is Excel? I don't do it.
I think the reason of that Useful to start and deceive the document immediately. Consider this scenario: We are at work, at work, and the survival of leadership shares the CSV file and asking for a quick calculation or several numbered numbers. Now, the options are:
1. Open the EDE (or writing letter) and start to include codes such as crazy to produce a simple matplotlib image;
2. Open the power, submit details, and start to create a report about powerful graphics.
3. Open CSV in Excel, write a few formulas, and create a picture.
I cannot talk to you, but many times I go to peers. Estimately because Excel files associated with everything, they are easily stated, and they begin to be useful.
I say all this as an introduction to make my point that I don't think the bright files go or soon, even fast AI development. Many will love that, many will hate that.
So, my action here would introduce AI to make Excel files well written. One of the main complaints of the data groups about Excel fails to lack the best practices and recycles, given the words columns can have the words and types of data, but documents.
Therefore, I have created an AI agent reading an Excel file and creates these smaller documents. Here's how it works:
- Excel file is converted into CSV and has been given a large language model (llm).
- AI agent build a data dictionary with column information (a variable name, data type, description).
- Data dictionary is added to a comment to the Excel file head.
- The output file is stored in a comment.
OK. Hands now. Let us do this in this regard.
Code
We will start by setting a visible location. Create a venv
For your favorite tool, like poems, Python Venv, Anacondo, or UV. I love UV, as it is very fast and easy, in my opinion. If you have an installed UV [5]Open Terminal and create your own venv
.
uv init data-docs
cd data-docs
uv venv
uv add streamlit openpyxl pandas agno mcp google-genai
Now, let us introduce the required modules. The project was created with Python 3.12.1, but I believe Python 3.9 or more may be a target. We will use:
- Gnobe: For AI administration administration
- Openpyxl: Stress for Excel files
- Support: In view of the front.
- PANDAS, OS, JSON, DEDTENT and Google Geniai support modules.
# Imports
import os
import json
import streamlit as st
from textwrap import dedent
from agno.agent import Agent
from agno.models.google import Gemini
from agno.tools.file import FileTools
from openpyxl import load_workbook
from openpyxl.comments import Comment
import pandas as pd
Good. The next step is creating jobs we will need to handle Excel files and making a AI agent.
Note that all activities have detail Docs. This is intended because the llms use docStrings to know what the work is given and decided whether you use it or not as a tool.
Therefore, when using Python activities such as AI agent tools, be sure to use detailed docsters. These days, free copilots like the windsurf [6] It is very easy to build.
Converting the file into CSV
This work is to:
- Take an Excel file and read only 10 lines. This is enough for us to send to the llm. Doing that, and we prevent sending many tokens as installing and making the most expensive Mente.
- Save the file as a CSV to use as an Agent's Input. The CSV format is easy for the model to add, as a number of text divided into commas. And we know that the llms play the text.
Here's a job.
def convert_to_csv(file_path:str):
"""
Use this tool to convert the excel file to CSV.
* file_path: Path to the Excel file to be converted
"""
# Load the file
df = pd.read_excel(file_path).head(10)
# Convert to CSV
st.write("Converting to CSV... :leftwards_arrow_with_hook:")
return df.to_csv('temp.csv', index=False)
Let's move on.
Creating an agent
The next work created an AI agent. I use Agno
[1]As variable and easy to use. I also chose the model Gemini 2.0 Flash
. During the test section, this was a better model that created data documents. To use it, you will need an API key from Google. Don't forget to get one here [7].
Work:
- Receives CSV listing in previous work.
- Passing in Agent for AI, producing data dictionary with column name, description, and data type.
- note that
description
Argument is the fastest form of agent. Make it clear and clear. - Data Dictionary will be saved as a
JSON
The file uses tool calledFileTools
That can also read and write files. - I trust
retries=2
So we can work in any fault with the first attempt.
def create_agent(apy_key):
agent = Agent(
model=Gemini(id="gemini-2.0-flash", api_key=apy_key),
description= dedent("""
You are an agent that reads the temp.csv dataset presented to you and
based on the name and data type of each column header, determine the following information:
- The data types of each column
- The description of each column
- The first column numer is 0
Using the FileTools provided, create a data dictionary in JSON format that includes the below information:
{: {ColName: , DataType: , Description: }}
If you are unable to determine the data type or description of a column, return 'N/A' for that column for the missing values.
"""),
tools=[ FileTools(read_files=True, save_files=True) ],
retries=2,
show_tool_calls=True
)
return agent
OK. Now we need another job to save the data dictionary in the file.
Adding a Data Dictionary to the File header
This is the last work to do. It will:
- Find the Data Dictionary
json
from the previous blade and the original Excel file. - Enter the data dictionary on the file head like a comment.
- Save the output file.
- When the file is saved, it displays the user download key to find a converted file.
def add_comments_to_header(file_path:str, data_dict:dict="data_dict.json"):
"""
Use this tool to add the data dictionary {data_dict.json} as comments to the header of an Excel file and save the output file.
The function takes the Excel file path as argument and adds the {data_dict.json} as comments to each cell
Start counting from column 0
in the first row of the Excel file, using the following format:
* Column Number:
* Column Name:
* Data Type:
* Description:
Parameters
----------
* file_path : str
The path to the Excel file to be processed
* data_dict : dict
The data dictionary containing the column number, column name, data type, description, and number of null values
"""
# Load the data dictionary
data_dict = json.load(open(data_dict))
# Load the workbook
wb = load_workbook(file_path)
# Get the active worksheet
ws = wb.active
# Iterate over each column in the first row (header)
for n, col in enumerate(ws.iter_cols(min_row=1, max_row=1)):
for header_cell in col:
header_cell.comment = Comment(dedent(f"""
ColName: {data_dict[str(n)]['ColName']},
DataType: {data_dict[str(n)]['DataType']},
Description: {data_dict[str(n)]['Description']}
"""),'AI Agent')
# Save the workbook
st.write("Saving File... :floppy_disk:")
wb.save('output.xlsx')
# Create a download button
with open('output.xlsx', 'rb') as f:
st.download_button(
label="Download output.xlsx",
data=f,
file_name='output.xlsx',
mime='application/vnd.openxmlformats-officedocument.spreadsheetml.sheet'
)
OK. The next step is to integrate all these together in the text of the Freaddit Front-End.
Streamlit Front-End
In this step, I could create a different file for the front end and enter the tasks there. But I decided to use the same file, so let's start and honor:
if __name__ == "__main__":
First, a few lines to configure the page and messages shown in the web app. We will use content centered
On page, and there are some details how the app works.
# Config page Streamlit
st.set_page_config(layout="centered",
page_title="Data Docs",
page_icon=":paperclip:",
initial_sidebar_state="expanded")
# Title
st.title("Data Docs :paperclip:")
st.subheader("Generate a data dictionary for your Excel file.")
st.caption("1. Enter your Gemini API key and the path of the Excel file on the sidebar.")
st.caption("2. Run the agent.")
st.caption("3. The agent will generate a data dictionary and add it as comments to the header of the Excel file.")
st.caption("ColName: | DataType: | Description: ")
st.divider()
Next, we will set a sidebar, where the user can enter their API key from Google and select a .xlsx
File to be changed.
There is an app usage button, the other to reset app status, as well as the progress bar. Nothing excellent.
with st.sidebar:
# Enter your API key
st.caption("Enter your API key and the path of the Excel file.")
api_key = st.text_input("API key: ", placeholder="Google Gemini API key", type="password")
# Upload file
input_file = st.file_uploader("File upload",
type='xlsx')
# Run the agent
agent_run = st.button("Run")
# progress bar
progress_bar = st.empty()
progress_bar.progress(0, text="Initializing...")
st.divider()
# Reset session state
if st.button("Reset Session"):
st.session_state.clear()
st.rerun()
Once run The clicked button, causes another code to conduct an agent. Here is a timetable of action:
- The first work has been called to convert the file into a CSV
- Progress is registered in progress bar.
- The agent is created.
- Progressbar is renewed.
- Agreement for feeding on agent to read
temp.csv
File, create a data dictionary, and save the output todata_dictionary.json
. - Data Dictionary printed on the screen, so the user can see what is made while stored in Excel file.
- Excel file is modified and stored.
# Create the agent
if agent_run:
# Convert Excel file to CSV
convert_to_csv(input_file)
# Register progress
progress_bar.progress(15, text="Processing CSV...")
# Create the agent
agent = create_agent(api_key)
# Start the script
st.write("Running Agent... :runner:")
# Register progress
progress_bar.progress(50, text="AI Agent is running...")
# Run the agent
agent.print_response(dedent(f"""
1. Use FileTools to read the temp.csv as input to create the data dictionary for the columns in the dataset.
2. Using the FileTools tool, save the data dictionary to a file named 'data_dict.json'.
"""),
markdown=True)
# Print the data dictionary
st.write("Generating Data Dictionary... :page_facing_up:")
with open('data_dict.json', 'r') as f:
data_dict = json.load(f)
st.json(data_dict, expanded=False)
# Add comments to header
add_comments_to_header(input_file, 'data_dict.json')
# Remove temporary files
st.write("Removing temporary files... :wastebasket:")
os.remove('temp.csv')
os.remove('data_dict.json')
# If file exists, show success message
if os.path.exists('output.xlsx'):
st.success("Done! :white_check_mark:")
os.remove('output.xlsx')
# Progress bar end
progress_bar.progress(100, text="Done!")
That's all. Here's the display of agent in action.

Good Outcome!
Attempted
You can try the app sent here:
Before you go
In my humble vision, Excel files don't go anytime soon. Love or hatred, we'll have to stick with them for a while.
Different Excel files, which is easy to deal with sharing, so they are still very helpful in the process specialty jobs at work.
However, we now resist AI to help us deal with those files and do it better. Artificial Intelligence affects many points in our lives. Only work and tools are only one.
Let's use AI and work with daily singular!
If you have loved this content, find my additional job on my website and GitTub, the stolen below.
GitHub Repository
Here's the GitHub's storage area of this project.
find me
You can find a lot about my work on my website.
Progress
[1. Agno Docs]
[2. Openpyxl Docs]
[3. Streamlit Docs]
[4. Data-Docs Web App]
[5. Installing UV]
[6. Windsurf Coding Copilot]
[7. Google Gemini API Key]