How to Build a Strong LLM Knowledge Base

a concept where you store a lot of information, and make it accessible for future use. This is incredibly powerful:
- Making better decisions
- It immediately suggests previous content
- Align your team
Lately, I've started working more on setting up a knowledge base and editing as much context as possible into it to help me develop all of the above points. Knowledge bases have always been useful even before LLMs, because it is always useful to have access to past knowledge. However, knowledge bases have grown exponentially thanks to LLMs.
This is for two main reasons:
- You can capture additional information through knowledge bases
- You can easily query the database (you don't have to look it up manually)
In this article, I will cover why you should set up a strong LLM knowledge base, how to capture as much knowledge as possible, and how to actively use the knowledge base.
I've discussed this topic before, but I've become very interested in the topic of knowledge bases because of how popular it has become. You, for example, have the president of Y Combinator building GBrain, or Andrej Karpathy building the LLM wiki, which are both examples of knowledge bases.
Of course, there is no ground truth for the right way to build a knowledge base. I think the most important thing is to start storing all your context in the knowledge base and figure out how to query the knowledge base effectively every time, for example, when writing code, in meetings, or similar.
Why you should have a knowledge base
First, I would like to cover why you should have a knowledge base. You can have different knowledge bases. For example, you can have a personal database that includes all the content you have personally, or you can have a company-wide database that includes the information or context that the company has.
The reason you have a knowledge base is that knowledge is very important. The more information you can store and access later when needed, the better you will do. You will be able, for example, to:
- Make better decisions because you have access to more context
- Quickly pick up past topics without having to look at different sources to get the information you had on the topic.
- Align different people together because they have one source of truth.
The same concepts basically apply to both if you have a personal knowledge base and if you have a company-wide knowledge base. I also believe that these knowledge bases are already very strong because you can ask about LLMs. Previously, you had to search the knowledge base manually to find the right information. You will have to use your memory to remember when a certain piece of information was stored in the knowledge base and decide whether to spend time finding that information or not.
Now that has completely changed. LLM itself can query the knowledge base, for example, in a RAG-type manner, and automatically find the relevant information quickly. The LLM can decide for itself when it needs to use the knowledge base.
That is, you completely remove the layer, the human-in-the-loop requirement, to access the information in the knowledge base, making it more powerful.
Entering information into a knowledge base
The first step in a knowledge base is, of course, to capture the information in the knowledge base. Depending on how your database is structured, this can happen in different ways.
However, the first thing I urge you to do is to think about all the different sources of information you have access to, either personally or in the company. These, for example:
- Meetings
- Your project management tool, like Linear.
- Your coding agent, such as Claude Code or Codex. What have you been working on lately with these models (and what works have been completed)
- Office discussions.
You can probably think of many other different sources of information. Of course, this depends a little on how you work and where you work. The point is that you have to map all these different sources of information, and you have to find an automated way to move information from these sources to your database.
You and other people will not be willing to spend a lot of time entering information into databases. You need to find a way to do this automatically so that your information is up to date.
It is important that you automate the flow of information from the source to the knowledge base. If you need a manual step (for example, attaching meeting notes to the knowledge base), you will definitely forget about it and lose the important context, which contradicts the whole concept of the knowledge base. The whole point of a knowledge base is that you keep all the information there and leave nothing out. That is what makes the knowledge base so powerful.
For example, with meeting notes, you can have a cron job that is synchronized daily. It is necessary to take notes of every meeting that everyone in the company has or has had in person, and keep it in a database. You can set up a similar cron job for your Linear or project management tool to synchronize everything that happens there. Sync your coding agent with what you've been working on, anything you've discussed with your coding agent, and more. All of this can be easily synced to the database with a daily cron job.
The discussions of the physical office are a point that is difficult to change completely. I haven't fully figured this out myself, but two options would be:
- recording everything that happens at all times, which may require permission
- or writing things down in person after having a conversation in the office
However, I think you may not even need to specify office interviews, because many times after I have a physical interview in the office, the interviewee or I will take the gist of that interview and write it to their coding agent. That dialog usually exists because of a query in use, so if that information is used actively in your coding agent later, you can retrieve it from the coding agent log.
So if you have successfully completed this step and stored all the content you encounter every day in your knowledge base, you have done a lot of work. This is the hard part with the knowledge base. In the next section, I will cover a simple component, actively using that information from the knowledge base when making decisions or communicating with code representatives.
Using information from a knowledge base
If you have a synchronized knowledge base with all the information you need, you can now move on to actively using this information. I think there are two main ways to use information from a knowledge base:
- You can simply ask the knowledge base if you have a question. This should, of course, be done through your encoding agent. You ask it a question, and it has to know that it has to query the knowledge base to get the answer.
- The second is that the coding agent uses the knowledge base passively whenever it runs.
I think the first application here is self-explanatory. Just ask a question whenever you are unsure about something. That is why I will spend more time discussing the second point here.
Having a coding agent make minimal use of the knowledge base whenever it is running, for example, doing code execution, debugging, etc. It's too strong. Again, I think there are two main ways to do this.
Description based on Grep
One is to have a top-level markup file in the knowledge base that describes the entire knowledge base and where the unique information is. This file is, of course, updated whenever you add more information to the knowledge base.
The advantage of this method is that it uses grep, which is often more powerful than embedding-based searches because it is better able to find the information it needs. However, this also requires you to place that markup file in the LLM context you always use. This tag file can grow large, which can become a problem after a while.
Definition based on embeddedness
A second way to actively use a knowledge base is to have an embedded definition. This is what GBrain is for. Basically, whenever you run a query, it runs an embedded search, like a RAG against the knowledge base, and it retrieves certain key pieces from the knowledge base. If the LLM thinks it has retrieved some important information using the embedded search, it can look further into the relevant files.
I think this is probably the best way to use the knowledge base during guesswork because it doesn't require an active search, and it doesn't need to spend a lot of input tokens on the knowledge base for everything you do.
However, which method works best will depend on your use cases.
The conclusion
In all, I urge you to:
- Try to establish a knowledge base
- Write as much information on it as possible
- Learn how others have created these knowledge bases
- Try to set it yourself
Then you should use this knowledge base continuously whenever you work on your computer using the coding agent (which should be for every job you do). I believe that knowledge bases will be incredibly powerful and valuable in the coming years, and they can also provide you with a trench because access to more information will be a definite benefit in the future. Moreover, this is data specific to your company or your personal context, in most cases, only you have access to it. So, if you don't save it, you won't be able to access that information again in the future.
👋 Touch
👉 My Free eBook and Webinar:
🚀 10x Your Engineering with LLMs (Free 3-Day Email Course)
📚 Get my free ebook Vision Language Models
💻 My webinar on Vision Language Models
👉 Find me on social media:
💌 Stack
🐦 X / Twitter



