Your Agentic Assistants: Blueprint for a Secure, User-friendly, Self-Hot Chatbot

How I've built a self-hosted, end-to-end environment that gives each user a personal, agentic chatbot that can independently search for files it explicitly allows access to.
In other words: Full control, 100% private, all the benefits of LLM without privacy leaks, token costs, or external dependencies.
– Nombeke
Last week, I challenged myself to build something that had been on my mind for a while:
How can I get llm with my private data without sacrificing privacy to big tech companies?
That led to this week's challenge:
Build an agentic Chatbot equipped with tools to access user notes securely, without compromising privacy.
As an added challenge, I wanted the system to support multiple users. Not a shared assistant but a A private agent for every user Where the user has full control over where their files can be read and consulted.
We will build the system in the following steps:
- Expertise in housing construction
- How do we build an agent and provide you with tools?
- Flow 1: User File Management: What Happens When We Deliver a File?
- Flow 2: How do we embed documents and storage files?
- Flow 3: What happens when we chat with our eventic assistant?
- A sign
1) Architecture
I have defined the main basic “flow” that the system should allow:
A) User file management
Users authenticate using frontlend, upload or delete files and assign each file to specific groups that determine which user agents can access it.
B) embedding and saving files
The included files are installed, embedded and stored in the database in a way that ensures only authorized users can retrieve or search for those requests.
C) discuss
The user chats with their agent. The agent is equipped with tools, including a A semantic vector-search tooland can only search documents that the user has permission to access.
To support this flow, the system is composed of six key elements:
Application
The Python application is the heart of the program. It exposes API Endpoints for the Front-End and listens for messages from static scripts
Front-end
Normally I would use angular but for this prototype I went with Streamlit. It was very quick and easy to build with. This ease of use of the course comes with the disadvantage of not being able to do everything I wanted. I plan to restore this feature with my go-to angluar but in my opinion my orientation was very good for Prototyping
Blob storage
This container runs minio; An open source, high-powered, distributed object storage system. Definitely an overkill for my prototype but it was easy to use and integrates well with Python, so I have no regrets.
(Vector) database
Postgres manages all related information such as document meta-data, users, usergroups and document-chunks. In addition Postgres provides an extension that I use to save vector data such as embeddings that we intend to create. This is very easy for my use case as I can allow a vector search on a table, join that table on users-, making sure that each user can see their own data.
I'm recovering
There are two local models: one for embedding and one to discuss. The models are very well weighted but can be easily upgraded, depending on the available hardware.
Message line
The rabbit makes the system responsive. Users don't have to wait while large files are compressed and embedded. Instead, I quickly come back and consider embedding in the background. It also gives me horizontal scalability: Multiple workers can process files at the same time.
2) Creating an agent with a toolbox
Langgraph makes it easy to define an agent: what actions can be taken, how it should reason and what tool it is allowed to use. This mentee can independently evaluate the available tools, read their descriptions and decide whether calling one of them will help answer the user's question.
Workflow is defined as a graph. Think of this as a blueprint for agent behavior. In this prototype the graph is intentionally simple:

LLM checks what tools are available and determines if a Tool-Call (such as vector search) is needed. And the graph loops through the tool node and back to the LLM node until no more tools are needed and the agent has enough information to respond.
3) Flow 1: Sending the file
This section describes what happens when a user uploads one or more files. First the user must log in, receiving a token that is used to authenticate API calls.
They can then upload files and assign those files to one or more groups. Any user in those groups will be allowed to access the file through their agent.

On the screen above the user selects two files; PDF and word document, and give them two groups. Behind the scenes, this is how the program processes uploads like this:

- The file and groups are sent to the API, which authenticates the user with a token.
- The file is stored in Blob storage, retrieving the storage space
- The file's metadata location and storage location are stored in the database, back
file_id - This page
file_idis published on the message line - The application has been completed; Users can continue to use the front end. Heavy processes (moving, embedding) occur later in the background)
This flow guarantees a fast and responsive upload experience, even for large files.
4) Flow 2: embedding and saving files
When the document is submitted, the next step is to make it faster. To do this we must pour Our documents. This means that we convert the text from the document into numeric fields that can hold semantic and meaning.
In the previous flow we sent a message to a queue. This message contains only a file_id And so very young. This means that the program remains fast even if the user uploads dozens or hundreds of files.
The message line also offers us two important advantages:
- It eases the workload by processing one document instead of all at the same time
- It's the future – it shows our system by allowing horizontal scaling; Multiple workers can listen to the same queue and process files in parallel.
Here's what happens when the embed worker receives the message:

- Take a message from the queue, the message contains a
file_id - Work
file_idRetrieving document meta data (sorting by users and allowed groups) - Use it
storage_locationfrom Meta data to download the file - The file is read, decrypted and divided into small chunks. Each chunk is embeddable: it is sent to an instance of the ollama environment to generate the embedding.
- Chunks and their sources are written to the database, along with file control information
At this time, the document is completely checked by the agent with a vector search, but only the users who have been granted access.
5) Flow 3: chatting with our agent
With all the items in place, we can start chatting with the agent.

When the user typed a message, the system went through several steps behind the scenes to deliver a quick response with context:
- The user submits immediately to the API and is authorized since only authorized users can interact with their private agent.
- The application optionally retrieves previous messages so that the agent has a “Memory” of the current conversation. This ensures that it can respond to the status of the ongoing conversation.
- The combined langgraph of the langgraph is destroyed.
- LLM, (runs on Ollama) reasons and voluntarily uses tools. If needed, it calls the vector search tool we defined in the graph, to find the appropriate chunks the user is allowed to access.
The agent then incorporates those findings into its reasoning and decides whether it has enough information to provide an adequate response. - The agent's feedback is further generated and broadcast back to the user for a smooth, authentic chat experience.
Meanwhile, the user chats with their private agent, complete with search capabilities for their notes.
6) Display
Let's see how this looks in practice.
I uploaded the name book with the following content:
Notes On the 21st of November I spoke with a guy named “Gert Vektorman” that turned out to be a developer at a Groningen company called “super data solutions”. Turns out that he was very interested in implementing agentic RAG at his company. We’ve agreed to meet some time at the end of december. Edit: I’ve asked Gert what his favorite programming language was; he like using Python Edit: we’ve met and agreed to create a test implementation. We’ll call this project “project greenfield”
I'll go ahead and upload this file.

After loading, I can see before then:
- The document is stored in the database
- It has been preached
- My agent has access to it
Now, let's talk.

As you can see, the agent is able to respond with information from our file. It's fast and surprisingly; This question was answered in a few seconds.
Lasting
I love challenges that allow me to test new technologies and work across the stack, from databases to agent graphs and front-end to docker images. Designing a system and choosing an effective construction is something I always enjoy. It allows me to turn our goals into requirements, flow, architecture, components, code and ultimately a working product.
This week's challenge is just that: exploring and experimenting with a secret, multi-user, agentic rag. I've built a functional, scalable, iterative, disabled workflow that can be improved in the future. Most I have found that this local, 100% private, agentic llm is possible.
Technical Studies
- Postgres + PGVECT you are strong. Keeping the embedding aside the associated metadata also kept everything clean, consistent and easy to query as there was no need for additional vector details.
- Langgraph it makes it surprisingly easy to define a workflow, equip it with tools and let the agent decide when to use them
- Private, local, WETICED managed agents are possible. With ollama running two lightweight models (one for chat, one for embedding), everything runs my macbook at impressive speeds
- Building a multi-tenant system with dictory data solate was very simple and their properties are clean and Ties were split in all elements
- Loose assembly makes it easy to replace and measure parts
Next steps
This program is ready for development:
- Re-extracting re-examination for documents that change over time
(So I can connect to my Oblidian room via internet). - It sucks That shows the user in the files / pages / pages / chunks LLM used to answer my question used, improve trust and interpretation.
- Many tools For agents – from files configured for SQL access. Maybe even ontologies or user profiles?
- An open combination For better file management and user experience
I hope this article was as clear as I intended it to be but if not let me know what else I can clarify. In the meantime, check out mine Other articles on all kinds of programming related topics.
Entering codes!
– Mike
PS: Like what am I doing? Follow me!



