Developing RAG: Without vanilla approaching

nimda February 25, 2025

0 6 6 minutes read

Developing RAG: Without vanilla approaching

Retrieved-Augmented Generation (RAG) is a powerful way to develop language models by installing alternative information. While the standardization of RAG is enhances the compliance of responding, they usually struggle with complex return conditions. This document evaluates the limitations of the Vanilla Rag setup and introduces advanced strategies to improve accuracy and operation.

The challenge with vanilla rag

Displaying the limitations of the RAG, think about easy trying when trying to find the correct information from the text set. Our data data includes:

The main document discussing good health, produces and is in good condition.
Two additional documents on non-compliant titles, but contains the same words used in different contexts.

main_document_text = """
Morning Routine (5:30 AM - 9:00 AM)
✅ Wake Up Early - Aim for 6-8 hours of sleep to feel well-rested.
✅ Hydrate First - Drink a glass of water to rehydrate your body.
✅ Morning Stretch or Light Exercise - Do 5-10 minutes of stretching or a short workout to activate your body.
✅ Mindfulness or Meditation - Spend 5-10 minutes practicing mindfulness or deep breathing.
✅ Healthy Breakfast - Eat a balanced meal with protein, healthy fats, and fiber.
✅ Plan Your Day - Set goals, review your schedule, and prioritize tasks.
...
"""

Using the standard set of RAG, we ask the program with:

What should I do to stay healthy and produce?
What are the best ways to stay healthy and productive?

Help Activities

Improving returns of return and disciplinary guidance, using a set of important advisory functions. These activities are performing a variety of purposes, from the Chatgpt API to combine roll documents and similar scores. By entering these functions, we build a very efficient pipeline RAG successfully recovering the appropriate information for users' questions.

Supporting our RAG development, explaining the following functions:

# **Imports**
import os
import json
import openai
import numpy as np
from scipy.spatial.distance import cosine
from google.colab import userdata

# Set up OpenAI API key
os.environ["OPENAI_API_KEY"] = userdata.get('AiTeam')

def query_chatgpt(prompt, model="gpt-4o", response_format=openai.NOT_GIVEN):
    try:
        response = client.chat.completions.create(
            model=model,
            messages=[{"role": "user", "content": prompt}],
            temperature=0.0 , # Adjust for more or less creativity
            response_format=response_format
        )
        return response.choices[0].message.content.strip()
    except Exception as e:
        return f"Error: {e}"

def get_embedding(text, model="text-embedding-3-large"): #"text-embedding-ada-002"
    """Fetches the embedding for a given text using OpenAI's API."""
    response = client.embeddings.create(
        input=[text],
        model=model
    )
    return response.data[0].embedding

def compute_similarity_metrics(embed1, embed2):
    """Computes different similarity/distance metrics between two embeddings."""
    cosine_sim = 1- cosine(embed1, embed2)  # Cosine similarity

    return cosine_sim

def fetch_similar_docs(query, docs, threshold = .55, top=1):
  query_em = get_embedding(query)
  data = []
  for d in docs:
    # Compute and print similarity metrics
    similarity_results = compute_similarity_metrics(d["embedding"], query_em)
    if(similarity_results >= threshold):
      data.append({"id":d["id"], "ref_doc":d.get("ref_doc", ""), "score":similarity_results})

  # Sorting by value (second element in each tuple)
  sorted_data = sorted(data, key=lambda x: x["score"], reverse=True)  # Ascending order
  sorted_data = sorted_data[:min(top, len(sorted_data))]
  return sorted_data

Vanilla Rag explore

Vanilla Rag tape functionality, we make a simple test using predefined questions. Our goal is to decide that the program returns the most relevant document based on the seemantic matches. We then analyze the limitations and evaluate possible development.

"""# **Testing Vanilla RAG**"""

query = "what should I do to stay healthy and productive?"
r = fetch_similar_docs(query, docs)
print("query = ", query)
print("documents = ", r)

query = "what are the best practices to stay healthy and productive ?"
r = fetch_similar_docs(query, docs)
print("query = ", query)
print("documents = ", r)

Advanced rag developed strategies

To further analyze the return process, we present advanced functions that improve the power of our RAG. These activities generate a formal information that helps from the restoration of the documents and questions, which makes our system stronger and knows.

Dealing with these challenges, using three important enhancements:

1. to produce FAQs

By automatically given to a list of regular questions that are related to the document, we increase the width of potential questions that may be compatible. These FAQs are produced and stored alongside the document, which provides a rich search space without getting ongoing costs.

def generate_faq(text):
  prompt = f'''
  given the following text: """{text}"""
  Ask relevant simple atomic questions ONLY (don't answer them) to cover all subjects covered by the text. Return the result as a json list example [q1, q2, q3...]
  '''
  return query_chatgpt(prompt, response_format={ "type": "json_object" })

2. To build a look at everything

The high-quality document summarizer helps to hold its ideas, to make effective return. With motivation to look at everything next to the document, we provide additional login points for relevant questions, improving the number of matches.

def generate_overview(text):
  prompt = f'''
  given the following text: """{text}"""
  Generate an abstract for it that tells in maximum 3 lines what is it about and use high level terms that will capture the main points,
  Use terms and words that will be most likely used by average person.
  '''
  return query_chatgpt(prompt)

3. Moved question

Instead of searching for the broader user questions, we break into smaller, precise restrictive borders. Each question is compared against our productivity of improved document, now including:

The actual document
FAQs produced
To look at everything made

By integrating the results of returning these many sources, we highly develop opportunities to find the right details.

def decompose_query(query):
  prompt = f'''
  Given the user query: """{query}"""
break it down into smaller, relevant subqueries
that can retrieve the best information for answering the original query.
Return them as a ranked json list example [q1, q2, q3...].
'''
  return query_chatgpt(prompt, response_format={ "type": "json_object" })

To explore advanced rag

Using these strategies, we also use our first questions. In the meantime, quotation produces several questions below, each focuses on the various factors for the first question. As a result, our program gets successfully access to the relevant information from FAQ and the original document, showing a great improvement in the Vanilla Rag in the way of the Vanilla Rag.

"""# **Testing Advanced Functions**"""

## Generate overview of the document
overview_text = generate_overview(main_document_text)
print(overview_text)
# generate embedding
docs.append({"id":"overview_text", "ref_doc": "main_document_text", "embedding":get_embedding(overview_text)})


## Generate FAQ for the document
main_doc_faq_arr = generate_faq(main_document_text)
print(main_doc_faq_arr)
faq =json.loads(main_doc_faq_arr)["questions"]

for f, i in zip(faq, range(len(faq))):
  docs.append({"id": f"main_doc_faq_{i}", "ref_doc": "main_document_text", "embedding":  get_embedding(f)})


## Decompose the 1st query
query = "what should I do to stay healty and productive?"
subqueries = decompose_query(query)
print(subqueries)




subqueries_list = json.loads(subqueries)['subqueries']


## compute the similarities between the subqueries and documents, including FAQ
for subq in subqueries_list:
  print("query = ", subq)
  r = fetch_similar_docs(subq, docs, threshold=.55, top=2)
  print(r)
  print('=================================n')


## Decompose the 2nd query
query = "what the best practices to stay healty and productive?"
subqueries = decompose_query(query)
print(subqueries)

subqueries_list = json.loads(subqueries)['subqueries']


## compute the similarities between the subqueries and documents, including FAQ
for subq in subqueries_list:
  print("query = ", subq)
  r = fetch_similar_docs(subq, docs, threshold=.55, top=2)
  print(r)
  print('=================================n')

Here are some FAQs built:

{
  "questions": [
    "How many hours of sleep are recommended to feel well-rested?",
    "How long should you spend on morning stretching or light exercise?",
    "What is the recommended duration for mindfulness or meditation in the morning?",
    "What should a healthy breakfast include?",
    "What should you do to plan your day effectively?",
    "How can you minimize distractions during work?",
    "How often should you take breaks during work/study productivity time?",
    "What should a healthy lunch consist of?",
    "What activities are recommended for afternoon productivity?",
    "Why is it important to move around every hour in the afternoon?",
    "What types of physical activities are suggested for the evening routine?",
    "What should a nutritious dinner include?",
    "What activities can help you reflect and unwind in the evening?",
    "What should you do to prepare for sleep?",
    …
  ]
}

Analysis of the cost benefit

While these enhancements are presenting the main FAQs at Upton-generating expenses, views, and embedded – this is costs at once in each document. In contrast, well-prepared RAG program will lead to two unemployment:

Unblocked users because of low quality return.
Cost of expanded question from returning many related documents.

For high quality volumes, this is not effective and quickly integrated, making it easier to invest.

Store

By combining the front document (FAQs and all view) by decaying the question, we create a very wise rag program for accuracy and cost. This approach enhances the quality of returning, reduces improper results, and ensure a better user experience.

As the RAG continues to appear, these processes will help in the test of AI repair programs. Future research can test additional, including strong stiffness and the validity of the reading refinement of the question.

Source link

nimda February 25, 2025

0 6 6 minutes read