To create a powerful AI task team using the Vector Database and Graq for Refund (RAG): Step by Step Guide

nimda February 1, 2025

0 8 6 minutes read

To create a powerful AI task team using the Vector Database and Graq for Refund (RAG): Step by Step Guide

Currently, three leading topics in AI are llms, Rag, and information details. This enables us to create programs that are good for our use. This Ai-powered program, including AI produced vector and the responses produced by AI, have applications around you in different industry. In customer support, AI Chatboots return the answers to information in power. Legal and financial and financial procedures benefit from the shortness of ai document and trial. Health assistants AI help doctors about medical research and drug dealings. Learning Platforms Provide custom-based company training. Mistquality uses AI Factory news and testing the truth. Developing software AI for code and correcting error. Benefits of scientific research from reviews of books held by AI. This method opens further information, which make creating content, and personal partnerships in many domains.

In this lesson, we will build a powerful English team using the RAG. The program includes the Vector database (chromadb) to keep and return relevant English language materials and Ai-Powered Tstraction Generation (Groq API) to create systematic and involved. The spending of a task includes the publication of the text from PDFs, keeping information from the Vector Database, returned appropriate content, and to generate details of AI. The goal is to develop an active English teacher produced by the schedules based on the topic while preventing information previously with advanced accuracy and context.

Step 1: Installing the required libraries

!pip install PyPDF2
!pip install groq
!pip install chromadb
!pip install sentence-transformers
!pip install nltk
!pip install fpdf
!pip install torch

PYPDF2 issues a text from PDF files, making it useful in handling the information based on the document. Groq is a library to give Ai Ai API of Groq, enabling the power to repair the advanced text. Chromadb is a vector database designed for the text. Transformers-Transformers manufactures the text embodding, which helps the maintenance and gain meaning. NLTK (Environmental Language Toolkit) is a well known Library of the NLP text of the text of the text, Tokenwation, and analysis. FPDF is a lack of light construction and delusion in PDF documents, allowing the subjects produced in an orderly manner. TORCH A Deep Learning Program often used for machine study activities, including AI-based AI generation.

Step 2: Download NLP Tokenization data

import nltk
nltk.download('punkt_tab')

Punkt_Tab data downloaded using the above code. NLTK.Download ('Punkt_tab') Describing the data required to find the sentence Tokenization. Tokenzation is to distinguish the text to be sentences or words, which is very important to break the large bodies of the processing controlled and returning components.

Step 3: To set the NLTK data directory

working_directory = os.getcwd()
nltk_data_dir = os.path.join(working_directory, 'nltk_data')

nltk.data.path.append(nltk_data_dir)
nltk.download('punkt_tab', download_dir=nltk_data_dir)

We will set up a dedicated directory of the NLTK data. OS.GETCWD () is a function of current practical guidance, and the new NLTK_Data guide is created within them maintaining NLP resources. NLTK.DATA.Path.appampemp (Nltk_Data_dir) The command confirms that these guidelines end up downloading NLTK details. Punkt_tab Dataset, which is required to find a sentence, be downloaded and stored in the specified identification.

Step 4: Import required libraries

import os
import torch
from sentence_transformers import SentenceTransformer
import chromadb
from chromadb.utils import embedding_functions
import numpy as np
import PyPDF2
from fpdf import FPDF
from functools import lru_cache
from groq import Groq
import nltk
from nltk.tokenize import sent_tokenize
import uuid
from dotenv import load_dotenv

Here, we introduced all the required libraries used throughout a letter of writing. OS is used for the functioning of the file system. Torch is introduced to treat deep-related deeper jobs. Transformers-Transformers provides easy production in the text. Chromadb and its module of numb prevention of NUMTUM is a library used to manage Arrays and the merger of numbers. PYPDF2 is used to remove text from PDFs. FPDF allows for PDF documents. LRU_CACHA is used for the release of cache jobs. Grim of AI service produces answers such as human. NLTK provides NLP's operations, and sent to_to is directly imported to distinguish text into sentences. The UUID produces unique ids, and the natural variability of the Load_dotenv.

Step 5: Environmental Impact Loading and API Key

load_dotenv()
api_key = os.getenv('api_key')
os.environ["GROQ_API_KEY"] = api_key

#or manually retrieve key from  and add it here

By using the above code, we will upload, natural variations from .env file. Load_dotenv () Work reads the natural variations from .env file and makes available within the Python area. API_KEY is found using OS.geegenv ('API_key'), confirms the API keys safe without removing it on the script. The key is stored in Os.enving[“GROQ_API_KEY”]Why It Counted in the Recent Appeals of API.

Step 6: Explaining the Vector Data Category

class VectorDatabase:
    def __init__(self, collection_name="english_teacher_collection"):
        self.client = chromadb.PersistentClient(path="./chroma_db")
        self.encoder = SentenceTransformer('all-MiniLM-L6-v2')
        self.embedding_function = embedding_functions.SentenceTransformerEmbeddingFunction(model_name="all-MiniLM-L6-v2")
        self.collection = self.client.get_or_create_collection(name=collection_name, embedding_function=self.embedding_function)

    def add_text(self, text, chunk_size):
        sentences = sent_tokenize(text, language="english")
        chunks = self._create_chunks(sentences, chunk_size)
        ids = [str(uuid.uuid4()) for _ in chunks]
        self.collection.add(documents=chunks, ids=ids)

    def _create_chunks(self, sentences, chunk_size):
        chunks = []
        for i in range(0, len(sentences), chunk_size):
            chunk = ' '.join(sentences[i:i+chunk_size])
            chunks.append(chunk)
        return chunks

    def retrieve(self, query, k=3):
        results = self.collection.query(query_texts=[query], n_results=k)
        return results['documents'][0]

This section describes the Vectoridatabase in contact with Chrodb to keep and restore the record based on the text. The __nit __ () The work is analyzing the database, creating the persistent Chronic directory of last time. The SentenCansform model (All-Minilm-L6-V2) Creates a motivational text, which converts the text information into submission of proper and supported values. Add_text () Work Breaks the installation text into sentences and divide it into small chunks before keeping them from the vector database. The _cruate_tunks () The work ensures that the text is well-separated, making it more effective. Retrieval () The work takes a question and returns the keys kept with the same support support.

Step 7: Using AI Lesson of Lesson with Grip

class GroqGenerator:
    def __init__(self, model_name="mixtral-8x7b-32768"):
        self.model_name = model_name
        self.client = Groq()

    def generate_lesson(self, topic, retrieved_content):
        prompt = f"Create an engaging English lesson about {topic}. Use the following information:n"
        prompt += "nn".join(retrieved_content)
        prompt += "nnLesson:"

        chat_completion = self.client.chat.completions.create(
            model=self.model_name,
            messages=[
                {"role": "system", "content": "You are an AI English teacher designed to create an elaborative and engaging lesson."},
                {"role": "user", "content": prompt}
            ],
            max_tokens=1000,
            temperature=0.7
        )
        return chat_completion.choices[0].message.content

This section, Groqgenerator, is responsible for producing Ai-Powered English courses. Contacts the Groq Ai model with API call. The __ni __ () The worker blasphemes the generator using a Mixtral-8×7-32768 model, designed for AI. EXERRE_FORN () The employee is taking a title and returned information such as installation, immediately shipping, and submit it to Groq API for the General General Groq. AI system restores a formal study with definitions and examples, which can be kept or indicated.

Step 8: Combining Vector Retrieval and Ai Generation

class RAGEnglishTeacher:
    def __init__(self, vector_db, generator):
        self.vector_db = vector_db
        self.generator = generator

    @lru_cache(maxsize=32)
    def teach(self, topic):
        relevant_content = self.vector_db.retrieve(topic)
        lesson = self.generator.generate_lesson(topic, relevant_content)
        return lesson

The class above, ragenglishTearcher, includes VectrateDatabase and the elements of the Gruqgenator to create a RAG) program. FUND () The function returns the right content from the Vector datota and transmits to the Groqgenerator to produce a formal lesson. LRU_Cache (MaxSize = 32) The intention of decoration reaches 32 previously produced courses to improve efficiency by avoiding repeated integration.

In conclusion, we successfully build A-Powered English Database (Chromadb) Recovery generation of receiving a text restitution can issue text from PDFs, keep the relevant information systematically, find proper information, and generate detailed courses. The minister provides us involved, the formation of the state, and customized courses through the inclusion of effective returns and AI relevant learning AI answers. This approach guaries the students to receive accurate, informative, well-organized external lessons needed to create handico content. The program can be increased by continuing additional learning modules, to improve the functioning of data, or the best AI answers to make the teaching process very and smart.

Use Brochure in the corner here. Also, don't forget to follow Sane and join ours Telegraph station including LinkedIn Grtopic. Don't forget to join ours 70k + ml subreddit.

🚨 Meet the Work: an open source opened with multiple sources to check the difficult program AI ^(Updated)

Sana Hassan, a contact in MarktechPost with a student of the Dual-degree student in the IIit Madras, loves to use technology and ai to deal with the real challenges of the world. I'm very interested in solving practical problems, brings a new view of ai solution to AI and real solutions.

✅ [Recommended] Join Our Telegraph Channel

Source link

nimda February 1, 2025

0 8 6 minutes read