Implementation of Codes Codeters to build a FASS, Langchain, Pydf, and Tinyllama-1.1b-Chat-v1.0

nimda March 23, 2025

0 6 3 minutes read

Implementation of Codes Codeters to build a FASS, Langchain, Pydf, and Tinyllama-1.1b-Chat-v1.0

Rag-powered Resionational Conversational Assessment Average Copies with money with traditional language by associating information programs. The program is searching through the basics of certain information, returns appropriate information, and has reflected conflicts with appropriate measures. This approach reduces HALLucinations, dealing with certain domain information, and the answers for reasons for re-adopted text. In this lesson, we will show that the assistant is using the Tinyllama Source Model – 1.1b-Chat-v1.0 from the face of face, along with the Langchain frame to answer questions on scientists.

First, let us include the required libraries:

!pip install langchain-community langchain pypdf sentence-transformers faiss-cpu transformers accelerate einops

Now, let's count on the required libraries:

import os
import torch
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import PyPDFLoader
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain.chains import ConversationalRetrievalChain
from langchain_community.llms import HuggingFacePipeline
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
import pandas as pd 
from IPython.display import display, Markdown

We will be moving the drive to save paper by an additional step:

from google.colab import drive
drive.mount('/content/drive')
print("Google Drive mounted")

On our basis for information, we will use PDF documents for scientific papers. Let's build work to load and process these documents:

def load_documents(pdf_folder_path):
    documents = []


    if not pdf_folder_path:
        print("Downloading a sample paper...")
        !wget -q  -O attention.pdf
        pdf_docs = ["attention.pdf"]
    else:
        pdf_docs = [os.path.join(pdf_folder_path, f) for f in os.listdir(pdf_folder_path)
                   if f.endswith('.pdf')]


    print(f"Found {len(pdf_docs)} PDF documents")


    for pdf_path in pdf_docs:
        try:
            loader = PyPDFLoader(pdf_path)
            documents.extend(loader.load())
            print(f"Loaded: {pdf_path}")
        except Exception as e:
            print(f"Error loading {pdf_path}: {e}")


    return documents




documents = load_documents("")

Next, we need to distinguish these documents into small returns of returning:

def split_documents(documents):
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=1000,
        chunk_overlap=200,
        length_function=len,
    )
    chunks = text_splitter.split_documents(documents)
    print(f"Split {len(documents)} documents into {len(chunks)} chunks")
    return chunks


chunks = split_documents(documents)

We will use sentence converts – creating the announcement of the vectors in our document chunks:

def create_vector_store(chunks):
    print("Loading embedding model...")
    embedding_model = HuggingFaceEmbeddings(
        model_name="sentence-transformers/all-MiniLM-L6-v2",
        model_kwargs={'device': 'cuda' if torch.cuda.is_available() else 'cpu'}
    )


    print("Creating vector store...")
    vector_store = FAISS.from_documents(chunks, embedding_model)
    print("Vector store created successfully!")
    return vector_store


vector_store = create_vector_store(chunks)

Now, let's upload the language model open to produce the answers. We will use Tinyllama, small enough to work on Colob but still strong enough for our work:

def load_language_model():
    print("Loading language model...")
    model_id = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"


    try:
        import subprocess
        print("Installing/updating bitsandbytes...")
        subprocess.check_call(["pip", "install", "-U", "bitsandbytes"])
        print("Successfully installed/updated bitsandbytes")
    except:
        print("Could not update bitsandbytes, will proceed without 8-bit quantization")


    from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, pipeline
    import torch


    tokenizer = AutoTokenizer.from_pretrained(model_id)


    if torch.cuda.is_available():
        try:
            quantization_config = BitsAndBytesConfig(
                load_in_8bit=True,
                llm_int8_threshold=6.0,
                llm_int8_has_fp16_weight=False
            )


            model = AutoModelForCausalLM.from_pretrained(
                model_id,
                torch_dtype=torch.bfloat16,
                device_map="auto",
                quantization_config=quantization_config
            )
            print("Model loaded with 8-bit quantization")
        except Exception as e:
            print(f"Error with quantization: {e}")
            print("Falling back to standard model loading without quantization")
            model = AutoModelForCausalLM.from_pretrained(
                model_id,
                torch_dtype=torch.bfloat16,
                device_map="auto"
            )
    else:
        model = AutoModelForCausalLM.from_pretrained(
            model_id,
            torch_dtype=torch.float32,
            device_map="auto"
        )


    pipe = pipeline(
        "text-generation",
        model=model,
        tokenizer=tokenizer,
        max_length=2048,
        temperature=0.2,
        top_p=0.95,
        repetition_penalty=1.2,
        return_full_text=False
    )


    from langchain_community.llms import HuggingFacePipeline
    llm = HuggingFacePipeline(pipeline=pipe)
    print("Language model loaded successfully!")
    return llm


llm = load_language_model()

Now, let's create our helpers by integrating a Victor store and model of language:

def format_research_assistant_output(query, response, sources):
    output = f"n{'=' * 50}n"
    output += f"USER QUERY: {query}n"
    output += f"{'-' * 50}nn"
    output += f"ASSISTANT RESPONSE:n{response}nn"
    output += f"{'-' * 50}n"
    output += f"SOURCES REFERENCED:nn"


    for i, doc in enumerate(sources):
        output += f"Source #{i+1}:n"
        content_preview = doc.page_content[:200] + "..." if len(doc.page_content) > 200 else doc.page_content
        wrapped_content = textwrap.fill(content_preview, width=80)
        output += f"{wrapped_content}nn"


    output += f"{'=' * 50}n"
    return output


import textwrap


research_assistant = create_research_assistant(vector_store, llm)


test_queries = [
    "What is the key idea behind the Transformer model?",
    "Explain self-attention mechanism in simple terms.",
    "Who are the authors of the paper?",
    "What are the main advantages of using attention mechanisms?"
]


for query in test_queries:
    response, sources = research_assistant(query, return_sources=True)
    formatted_output = format_research_assistant_output(query, response, sources)
    print(formatted_output)

In this lesson, we create a changing research assistant using the generation of receiving refunds with open models. The RAG develops language models by combining the redesigning documents, reduce the clarification, and ensure the accuracy of the site. The guide moves in the environment, processing scientific documents, creating the Vector and the opening and the open transformers, as well as the open language model such as Tinyllama such as Tinyllama. The assistant returns chunks to qualify documents and produce answers on measurements. This implementation allows users to view the basis of information, which makes AI strong research and functioning well in answering certain domain questions.

Here is the Colab Notebook. Also, don't forget to follow Sane and join ours Telegraph station including LinkedIn Grtopic. Don't forget to join ours 85k + ml subreddit.

Asphazzaq is a Markteach Media Inc. According to a View Business and Developer, Asifi is committed to integrating a good social intelligence. His latest attempt is launched by the launch of the chemistrylife plan for an intelligence, MarktechPost, a devastating intimate practice of a machine learning and deep learning issues that are clearly and easily understood. The platform is adhering to more than two million moon visits, indicating its popularity between the audience.

Source link

nimda March 23, 2025

0 6 3 minutes read