Machine Learning

Excel file dictionary using Openpyxl and Ai Agents

Every company I worked for until today, was: DIDEENT MS Excel.

Excel was first issued in 1985 and has remained strong till today. We have survived the upgrade of communication information, the appearance of many planning languages, the Internet with a limit to its unlimited application number, and ultimately, and survive during the AI ​​period.

PEW!

Do you have any doubts about how strong is Excel? I don't do it.

I think the reason of that Useful to start and deceive the document immediately. Consider this scenario: We are at work, at work, and the survival of leadership shares the CSV file and asking for a quick calculation or several numbered numbers. Now, the options are:

1. Open the EDE (or writing letter) and start to include codes such as crazy to produce a simple matplotlib image;

2. Open the power, submit details, and start to create a report about powerful graphics.

3. Open CSV in Excel, write a few formulas, and create a picture.

I cannot talk to you, but many times I go to peers. Estimately because Excel files associated with everything, they are easily stated, and they begin to be useful.

I say all this as an introduction to make my point that I don't think the bright files go or soon, even fast AI development. Many will love that, many will hate that.

So, my action here would introduce AI to make Excel files well written. One of the main complaints of the data groups about Excel fails to lack the best practices and recycles, given the words columns can have the words and types of data, but documents.

Therefore, I have created an AI agent reading an Excel file and creates these smaller documents. Here's how it works:

  1. Excel file is converted into CSV and has been given a large language model (llm).
  2. AI agent build a data dictionary with column information (a variable name, data type, description).
  3. Data dictionary is added to a comment to the Excel file head.
  4. The output file is stored in a comment.

OK. Hands now. Let us do this in this regard.

Code

We're not Code! | A picture produced by AI. Meta Llama, 2025.

We will start by setting a visible location. Create a venv For your favorite tool, like poems, Python Venv, Anacondo, or UV. I love UV, as it is very fast and easy, in my opinion. If you have an installed UV [5]Open Terminal and create your own venv.

uv init data-docs
cd data-docs
uv venv
uv add streamlit openpyxl pandas agno mcp google-genai

Now, let us introduce the required modules. The project was created with Python 3.12.1, but I believe Python 3.9 or more may be a target. We will use:

  • Gnobe: For AI administration administration
  • Openpyxl: Stress for Excel files
  • Support: In view of the front.
  • PANDAS, OS, JSON, DEDTENT and Google Geniai support modules.
# Imports
import os
import json
import streamlit as st
from textwrap import dedent

from agno.agent import Agent
from agno.models.google import Gemini
from agno.tools.file import FileTools

from openpyxl import load_workbook
from openpyxl.comments import Comment
import pandas as pd

Good. The next step is creating jobs we will need to handle Excel files and making a AI agent.

Note that all activities have detail Docs. This is intended because the llms use docStrings to know what the work is given and decided whether you use it or not as a tool.

Therefore, when using Python activities such as AI agent tools, be sure to use detailed docsters. These days, free copilots like the windsurf [6] It is very easy to build.

Converting the file into CSV

This work is to:

  • Take an Excel file and read only 10 lines. This is enough for us to send to the llm. Doing that, and we prevent sending many tokens as installing and making the most expensive Mente.
  • Save the file as a CSV to use as an Agent's Input. The CSV format is easy for the model to add, as a number of text divided into commas. And we know that the llms play the text.

Here's a job.

def convert_to_csv(file_path:str):
   """
    Use this tool to convert the excel file to CSV.

    * file_path: Path to the Excel file to be converted
    """
   # Load the file  
   df = pd.read_excel(file_path).head(10)

   # Convert to CSV
   st.write("Converting to CSV... :leftwards_arrow_with_hook:")
   return df.to_csv('temp.csv', index=False)

Let's move on.

Creating an agent

The next work created an AI agent. I use Agno [1]As variable and easy to use. I also chose the model Gemini 2.0 Flash. During the test section, this was a better model that created data documents. To use it, you will need an API key from Google. Don't forget to get one here [7].

Work:

  • Receives CSV listing in previous work.
  • Passing in Agent for AI, producing data dictionary with column name, description, and data type.
  • note that description Argument is the fastest form of agent. Make it clear and clear.
  • Data Dictionary will be saved as a JSON The file uses tool called FileTools That can also read and write files.
  • I trust retries=2 So we can work in any fault with the first attempt.
def create_agent(apy_key):
    agent = Agent(
        model=Gemini(id="gemini-2.0-flash", api_key=apy_key),
        description= dedent("""
                            You are an agent that reads the temp.csv dataset presented to you and 
                            based on the name and data type of each column header, determine the following information:
                            - The data types of each column
                            - The description of each column
                            - The first column numer is 0

                            Using the FileTools provided, create a data dictionary in JSON format that includes the below information:
                            {: {ColName: , DataType: , Description: }}

                            If you are unable to determine the data type or description of a column, return 'N/A' for that column for the missing values.
                            
                            """),
        tools=[ FileTools(read_files=True, save_files=True) ],
        retries=2,
        show_tool_calls=True
        )

    return agent

OK. Now we need another job to save the data dictionary in the file.

Adding a Data Dictionary to the File header

This is the last work to do. It will:

  • Find the Data Dictionary json from the previous blade and the original Excel file.
  • Enter the data dictionary on the file head like a comment.
  • Save the output file.
  • When the file is saved, it displays the user download key to find a converted file.
def add_comments_to_header(file_path:str, data_dict:dict="data_dict.json"):
    """
    Use this tool to add the data dictionary {data_dict.json} as comments to the header of an Excel file and save the output file.

    The function takes the Excel file path as argument and adds the {data_dict.json} as comments to each cell
    Start counting from column 0
    in the first row of the Excel file, using the following format:    
        * Column Number: 
        * Column Name: 
        * Data Type: 
        * Description: 

    Parameters
    ----------
    * file_path : str
        The path to the Excel file to be processed
    * data_dict : dict
        The data dictionary containing the column number, column name, data type, description, and number of null values

    """
    
    # Load the data dictionary
    data_dict = json.load(open(data_dict))

    # Load the workbook
    wb = load_workbook(file_path)

    # Get the active worksheet
    ws = wb.active

    # Iterate over each column in the first row (header)
    for n, col in enumerate(ws.iter_cols(min_row=1, max_row=1)):
        for header_cell in col:
            header_cell.comment = Comment(dedent(f"""
                              ColName: {data_dict[str(n)]['ColName']}, 
                              DataType: {data_dict[str(n)]['DataType']},
                              Description: {data_dict[str(n)]['Description']}
    """),'AI Agent')

    # Save the workbook
    st.write("Saving File... :floppy_disk:")
    wb.save('output.xlsx')

    # Create a download button
    with open('output.xlsx', 'rb') as f:
        st.download_button(
            label="Download output.xlsx",
            data=f,
            file_name='output.xlsx',
            mime='application/vnd.openxmlformats-officedocument.spreadsheetml.sheet'
        )

OK. The next step is to integrate all these together in the text of the Freaddit Front-End.

Streamlit Front-End

In this step, I could create a different file for the front end and enter the tasks there. But I decided to use the same file, so let's start and honor:

if __name__ == "__main__":

First, a few lines to configure the page and messages shown in the web app. We will use content centered On page, and there are some details how the app works.

# Config page Streamlit
    st.set_page_config(layout="centered", 
                       page_title="Data Docs", 
                       page_icon=":paperclip:",
                       initial_sidebar_state="expanded")
    
    # Title
    st.title("Data Docs :paperclip:")
    st.subheader("Generate a data dictionary for your Excel file.")
    st.caption("1. Enter your Gemini API key and the path of the Excel file on the sidebar.")
    st.caption("2. Run the agent.")
    st.caption("3. The agent will generate a data dictionary and add it as comments to the header of the Excel file.")
    st.caption("ColName:  | DataType:  | Description: ")
    
    st.divider()

Next, we will set a sidebar, where the user can enter their API key from Google and select a .xlsx File to be changed.

There is an app usage button, the other to reset app status, as well as the progress bar. Nothing excellent.

with st.sidebar:
        # Enter your API key
        st.caption("Enter your API key and the path of the Excel file.")
        api_key = st.text_input("API key: ", placeholder="Google Gemini API key", type="password")
        
        # Upload file
        input_file = st.file_uploader("File upload", 
                                       type='xlsx')
        

        # Run the agent
        agent_run = st.button("Run")

        # progress bar
        progress_bar = st.empty()
        progress_bar.progress(0, text="Initializing...")

        st.divider()

        # Reset session state
        if st.button("Reset Session"):
            st.session_state.clear()
            st.rerun()

Once run The clicked button, causes another code to conduct an agent. Here is a timetable of action:

  1. The first work has been called to convert the file into a CSV
  2. Progress is registered in progress bar.
  3. The agent is created.
  4. Progressbar is renewed.
  5. Agreement for feeding on agent to read temp.csv File, create a data dictionary, and save the output to data_dictionary.json.
  6. Data Dictionary printed on the screen, so the user can see what is made while stored in Excel file.
  7. Excel file is modified and stored.
# Create the agent
    if agent_run:
        # Convert Excel file to CSV
        convert_to_csv(input_file)

        # Register progress
        progress_bar.progress(15, text="Processing CSV...")

        # Create the agent
        agent = create_agent(api_key)

        # Start the script
        st.write("Running Agent... :runner:")

        # Register progress
        progress_bar.progress(50, text="AI Agent is running...")

        # Run the agent    
        agent.print_response(dedent(f"""
                                1. Use FileTools to read the temp.csv as input to create the data dictionary for the columns in the dataset. 
                                2. Using the FileTools tool, save the data dictionary to a file named 'data_dict.json'.
                                
                                """),
                        markdown=True)

        # Print the data dictionary
        st.write("Generating Data Dictionary... :page_facing_up:")
        with open('data_dict.json', 'r') as f:
            data_dict = json.load(f)
            st.json(data_dict, expanded=False)

        # Add comments to header
        add_comments_to_header(input_file, 'data_dict.json')

        # Remove temporary files
        st.write("Removing temporary files... :wastebasket:")
        os.remove('temp.csv')
        os.remove('data_dict.json')    
    
    # If file exists, show success message
    if os.path.exists('output.xlsx'):
        st.success("Done! :white_check_mark:")
        os.remove('output.xlsx')

    # Progress bar end
    progress_bar.progress(100, text="Done!")

That's all. Here's the display of agent in action.

Data documents cannot be added to your Excel file. Photo by author.

Good Outcome!

Attempted

You can try the app sent here:

Before you go

In my humble vision, Excel files don't go anytime soon. Love or hatred, we'll have to stick with them for a while.

Different Excel files, which is easy to deal with sharing, so they are still very helpful in the process specialty jobs at work.

However, we now resist AI to help us deal with those files and do it better. Artificial Intelligence affects many points in our lives. Only work and tools are only one.

Let's use AI and work with daily singular!

If you have loved this content, find my additional job on my website and GitTub, the stolen below.

GitHub Repository

Here's the GitHub's storage area of ​​this project.

find me

You can find a lot about my work on my website.

Progress

[1. Agno Docs]

[2. Openpyxl Docs]

[3. Streamlit Docs]

[4. Data-Docs Web App]

[5. Installing UV]

[6. Windsurf Coding Copilot]

[7. Google Gemini API Key]

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button