Generative AI

Creating Your Ai Q & Bot of Web Uses Special Sources for AI AI

Today's Digital Vamscape's knowledge, the navigation web content can be difficult. Whether you do research on project, you study complex objects, or try to remove specific information from long tags, the process can take time and unemployment. That's when the question of Ai-Powered Powered (Q & a) BOT is helpful.

This lesson will guide you to create an operating system AI Q & can analyze WebPage content and answer some questions. Instead of reliance on expensive API services, we will use open source models from facial swing to create a solution that:

  • Is completely free to use it
  • Works on Google Colab (No Local Setup required)
  • In customizations with your specific needs
  • Designed to NLP cutting technology technology

At the end of this lesson, you will have a valid Web site that can help you issue online content properly.

What will we build

We will create a program that is:

  1. It takes the URL as installing
  2. Releasing and processing the content of the webpage
  3. Accept the questions of natural language about the content
  4. Provides accurate, kingdom responses based on a webpage

Requirements

  • Google Account to access Google Colab
  • The basic understanding of Python
  • No Previous Machine Information required

Step 1: Setting Nature

First, let's build a new brochure of Google Colab. Go to Google Colab and create a new book.

Let's start by installing the required libraries:

# Submit required packages

!pip install transformers torch beautifulsoup4 requests

This is included:

  • Translists: Sigging Face Facelain for NLP NLP models
  • TORCH: A Depreciation Frame of Bytorch
  • Beapsoup4: with HTML defraud and removing web content
  • Applications: for request for HTTP applications in web camps

Step 2: Import information libraries and set up basic activities

Let us now add all libraries needed and explain certain services of assistant:

import torch
from transformers import AutoModelForQuestionAnswering, AutoTokenizer
import requests
from bs4 import BeautifulSoup
import re
import textwrap

# Check that GPU is available

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")

# Work to remove text on a webpage

def extract_text_from_url(url):
   try:
       headers = {
           'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
       }
       response = requests.get(url, headers=headers)
       response.raise_for_status()  
       soup = BeautifulSoup(response.text, 'html.parser')


       for script_or_style in soup(['script', 'style', 'header', 'footer', 'nav']):
           script_or_style.decompose()


       text = soup.get_text()


       lines = (line.strip() for line in text.splitlines())
       chunks = (phrase.strip() for line in lines for phrase in line.split("  "))
       text="n".join(chunk for chunk in chunks if chunk)


       text = re.sub(r's+', ' ', text).strip()


       return text


   except Exception as e:
       print(f"Error extracting text from URL: {e}")
       return None

This code:

  1. It introduces all required libraries
  2. It puts our device (GPU if available, if not CPU)
  3. Creates a function of issuing text content from the Web URL

Step 3: Upload a Model to Answer the Questions

Now let's upload the model for training-trained questions before the face of face:

# Upload the previous trained model and Tokozer

model_name = "deepset/roberta-base-squad2"


print(f"Loading model: {model_name}")
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForQuestionAnswering.from_pretrained(model_name).to(device)
print("Model loaded successfully!")

We use Depth / Roberta-Base Squad2which is:

  • Based on Roberta Architecture (Bert-prepared method of Bert)
  • Well organized in Squad 2.0 (Stanford Questing Data Reply)
  • Best balance between accuracy and speed with our work

Step 4: Use the work of answering questions

Now, let's use basic performance – the ability to answer questions based on content on a webpage issued:

def answer_question(question, context, max_length=512):
   max_chunk_size = max_length - len(tokenizer.encode(question)) - 5  
   all_answers = []


   for i in range(0, len(context), max_chunk_size):
       chunk = context[i:i + max_chunk_size]


       inputs = tokenizer(
           question,
           chunk,
           add_special_tokens=True,
           return_tensors="pt",
           max_length=max_length,
           truncation=True
       ).to(device)


       with torch.no_grad():
           outputs = model(**inputs)


       answer_start = torch.argmax(outputs.start_logits)
       answer_end = torch.argmax(outputs.end_logits)


       start_score = outputs.start_logits[0][answer_start].item()
       end_score = outputs.end_logits[0][answer_end].item()
       score = start_score + end_score


       input_ids = inputs.input_ids.tolist()[0]
       tokens = tokenizer.convert_ids_to_tokens(input_ids)


       answer = tokenizer.convert_tokens_to_string(tokens[answer_start:answer_end+1])


       answer = answer.replace("[CLS]", "").replace("[SEP]", "").strip()


       if answer and len(answer) > 2:  
           all_answers.append((answer, score))


   if all_answers:
       all_answers.sort(key=lambda x: x[1], reverse=True)
       return all_answers[0][0]
   else:
       return "I couldn't find an answer in the provided content."

This work:

  1. It takes a question and content of webpage as installing
  2. Handles long content by processing it in chunks
  3. Using a model to predict the span response (start and the last positions)
  4. Processing many chunks and returns feedback with the highest confidence number

Step 5: Testing and Examples

Let us consider our plan according to examples. Here is the perfect code:

url = "
webpage_text = extract_text_from_url(url)


print("Sample of extracted text:")
print(webpage_text[:500] + "...")


questions = [
   "When was the term artificial intelligence first used?",
   "What are the main goals of AI research?",
   "What ethical concerns are associated with AI?"
]


for question in questions:
   print(f"nQuestion: {question}")
   answer = answer_question(question, webpage_text)
   print(f"Answer: {answer}")

This will indicate how the system works with real examples.

The release of the above code

Limitations and Developing

Our current implementation has some restrictions:

  1. Betled with a very long webpases of web due to the limitations of the length of conditions
  2. The model may not understand the complex or restricted questions
  3. Works well with the true content rather than ideas or material things

Future improvement can include:

  • Starting Semantic Search to better patrolled documents
  • Adding Summic Skills Skills
  • Supporting Many Languages
  • Using to remember previous questions and answers
  • Good planning model to specific domains (eg, medical, legal, technical)

Store

You now successfully come your Q & A system dedicated to the Web & A web using open models. This tool can help you:

  • Release specific information from long documents
  • Successfully research
  • Get quick answers from complex documents

By using the powerful models of the face and the flexibility of Google Colab, create an effective app showing modern NLP skills. Feel free to customize and extend this project to meet specific requirements.

Practical resources


Here is the Colab Notebook. Also, don't forget to follow Sane and join ours Telegraph station including LinkedIn Grtopic. Don't forget to join ours 85k + ml subreddit.

🔥 [Register Now] The Minicon Virtual Conference at an open Source AI: Free Registration + 3 3 Certificate Reference (April 12, 9 pm 12 pm) + workshop [Sponsored]


ASJAD is the study adviser in the MarktechPost region. It invites the B.Tech in Mesher Engineering to the Indian Institute of Technology, Kharagpur. ASJAD reading mechanism and deep readings of the learner who keeps doing research for machinery learning applications in health care.

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button