How to Build an Agentic RAG with Hybrid Search

nimda March 13, 2026

0 7 5 minutes read

How to Build an Agentic RAG with Hybrid Search

also known as RAG, is a powerful way to find relevant documents in a database, which you then provide to LLM to provide answers to user queries.

Traditionally, RAG first uses vector matching to find relevant pieces of text in a corpus and then feeds the most relevant pieces to LLM to provide an answer.

This works very well in many cases as semantic similarity is a powerful way to find the most relevant fragments. However, semantic matching is difficult in some cases, for example, when the user enters certain keywords or IDs that need to be clearly found in order to be used as a suitable match. In these cases, vector matching is not very effective, and you need a better way to find the most suitable pieces.

This is where keyword search comes in, where you find the right pieces while using keyword search and vector similarity, also known as hybrid search, which is the topic I will discuss today.

This infographic highlights the main content of this article. I will be discussing how to use the RAG program using hybrid search. Picture of Gemini

Why use advanced search?

Vector similarity is very powerful. It can successfully find the right pieces in a corpus of documents, even if the input command has typos or uses similar words like the word lift instead of the word lifti.

However, vector matching falls short in some cases, especially when searching for specific keywords or identifiers. The reason for this is that vector similarity does not measure individual words or IDs against other words. Therefore, keywords or key identifiers are often drowned out by other keywords, making it difficult for semantic matching to find the most relevant phrases.

Keyword search, however, is best for specific keywords and identifiers, as the name suggests. With BM25, for example, if you have a word that is only found in one document and no other documents, and that word is in a user's query, that document will be rated highly and likely included in the search results.

This is the main reason you want to use advanced search. You can easily find the most related documents if the user enters keywords in his query.

How to use hybrid search

There are many ways to perform a hybrid search. If you want to do it yourself, you can do the following.

Use vector retrieval for semantic matching as you would normally. I'm not going to cover the exact details in this article because it's out of place, and the main point of this article is to cover the keyword search part of hybrid search.
Use BM25 or another keyword search algorithm of your choice. BM25 is the standard as it builds on TF-IDF and has a better formula, making it a better choice. However, the keyword search algorithm you use doesn't really matter, although I recommend using BM25 as standard.
Scale between similarity found through semantic similarity and keyword search similarity. You can determine this rating yourself depending on what you consider most important. If you have an agent doing hybrid search, you can also have the agent decide this weight, as agents will usually have a good sense of when to use wait, left or more matches, and when to weight the search matches for keywords more.

There are also packages you can use to accomplish this, such as the TurboPuffer vector buffer, which has the Keyboard Search package implemented. To learn how the system actually works, it is also recommended that you use this yourself to try the system and see if it works.

Overall, however, hybrid search is not that difficult to implement and can provide many benefits. When looking at hybrid search, you usually know how vector search itself works and you just need to add a keyword search feature to it. The keyword search itself is not complicated though, which makes hybrid search an easy-to-use tool, which can bring many benefits.

Agentic hybrid search

Using hybrid search is great, and will likely improve how well your RAG system works right off the bat. However, I believe that if you really want to get the most out of the RAG hybrid search system, you need to make it work.

By making it work, I mean the following. A typical RAG program starts by fetching the relevant chunks, document chunks, feeding those chunks into the LLM, and answering the user's query.

However, the agent's RAG system does it differently. Instead of doing the trunk retrieval before using LLM to respond, you make the trunk retrieval function a tool that LLM can access. This, of course, makes the LLM work, so it has access to the tool and has several major advantages:

The agent can determine by itself the information to be used in the search vector. So instead of using only the user's specific information, it can rewrite this information to get the best vector search results. Query rewriting is a well-known technique that you can use to improve RAG performance.
The agent can fetch information iteratively, so it first makes one call to search the vector, checks if it has enough information to answer the query, and if not, it can fetch more information. This enables the agent to review the information it has taken and, if necessary, download additional information, which will enable it to better answer questions.
The agent can decide the weighting between the keyword search and the similarity vector itself. This is incredibly powerful because the agent often knows they are searching for a keyword or looking for exactly the same content. For example, if the user entered a keyword in his search query, the agent will probably rate the keyword search factor very high, and let's get the best results. This works much better than having a static number to scale between a keyword search and a similarity vector.

Today's Frontier LLMs are incredibly empowered and will be able to make all of these decisions themselves. A few months ago, I would doubt if you should give the agent as much freedom as I explained in the above bullet points, if it chooses to be used quickly, to download information repeatedly, and the weight between keyword search and semantic similarity. However, today I know that the latest Frontier LLMs have become so powerful that this is very possible and even something I recommend starting.

Therefore, by using both HybridSearch and activating it, you can really supercharge your RAG system and achieve better results than you would with a vector-only RAG system.

The conclusion

In this article, I discussed how you can use hybrid search in your RAG system. In addition, I explained how to make your RAG program authentic to get the best results. Combining these two methods will lead to a dramatic performance increase in your information retrieval system, and in fact, can be easily done using coding agents like Claude Code. I believe Agentex Systems is the future of information acquisition, and I urge you to provide effective information acquisition tools, such as advanced search, to your agents and let them do all the work.

👉 My Free eBook and Webinar:

🚀 10x Your Engineering with LLMs (Free 3-Day Email Course)

📚 Get my free ebook Vision Language Models

💻 My webinar on Vision Language Models

👉 Find me on social media:

💌 Stack

🔗 LinkedIn

🐦 X / Twitter

Source link

nimda March 13, 2026

0 7 5 minutes read