Machine Learning

Learn how to use transformers with huggamfafaface and sparres

Introduction

The state of the creation of the NLP art not only. Modern Models are like ChatGPT, LLAMA, and Gemma based on the program launched in 2017 in the care of all you need paper from Vakani Et.

In the article, we realized how to use Spacering to achieve several tasks, and you may have noticed anything, but it provides the skills based on restoration.

Spy and offers the installation to the components of NLP Pipelineble trained or using the shelves of the shelf shelf from 🤗 Huggingsface Hub, part of the online platform that gives open models developing the AI ​​enhancers.

So let's learn how to use spayy with the facial models!

Why is it changed?

Before Transformers Sota Archites create vector word introductions Vaiter Vectors strategies. The vector of the words is a dense indication of the word, which we can use to make mathematical calculation.

For example, we can see that two words have the same meaning have the same vector. The most famous strategies of this kind Glove including FastText.

These methods, however, have introduced a serious problem, the name is represented regularly through the same veter. But the word doesn't always have the same meaning.

For example:

  • “He went to the edge to withdraw some money. “
  • “He sat about with edge of the river, watching the flow of water. “

In two sentences, the Word Bank takes two different definitions, so it is not reasonable that they always represent the same name as the same vector.

With transformer-based Workcanture, we can create Models are thought of all the context producing the vectorial presentations of the word.

SRC:

The main establishment of the network is the one Multi-Head Slock Block. If you aren't familiar with it, I recently wrote an article about this:

Transformer is made of two parts. The left part, which is unification What causes the representation of the Scriptural writings, and the relevant part, DOCODERused to produce a new text. For example, GPT is based on the correct part, because it produces the text as a dialog.

In this article, we are interested in the Encoder, which is able to capture the Semantics of the text we give as installing.

Bert and Roberta

This will not be a course about these models, but let's get back some of the main topics.

While chatgpt is built on the decoder side of the transformer of Transformmer, Bert and Roberta is based on the encoder side.

Bert is introduced by Google in 2018 and you can learn a lot about it here:

Bert is a bundle of encoder layers. There is two size for model. Bert base contains 12 caders while Bert is the largest 24 encoders

SRC:

The Bert had produced a cake of size 768, and the other great was Veictor of size 1024. They both took 512 tokens.

The belt by Bert Model is called WordPiece.

Bert is trained for two purposes:

  • Masked Language ModoDing (MLM): Predicts lost tokens (masked) inside the sentence.
  • The following predictions of sentences (NSP): It determines whether a second-given sentence is given to follow the first.

Roberta's model built more than Bert with a huge difference:

Roberta uses a strong maskic, so the selected token tokens for all the Iterations during training, and does not use NSP purposes as training purposes.

Use Robertter with sparrately

This page TextCategorizer A terrible part that predicts one or more labels for the whole scroll. Can apply in two ways:

  • Execcel_class = authentic: One label for each text (eg. True or negativeSelected
  • Execcel_classses = lies: Many labels each text (eg. spam, urgent, playSelected

Spacy may include this by different embedding:

  • Classic vectors (tok2vecSelected
  • Transformer models are like Robertawhich we use here

In this way we can understand Roberta's understanding of English, and it is able to join it with a simple pipe to make ready for preparation.

If you have dataset, you can continue to train the Roberta model using Spacy to do well in a specific low job trying to resolve it.

Preparing Dataset

In this article I will use Trec Dataset, containing short Questions. Each question is called Type of Answer FindingAs:

Label Comprehension
By Abbreviation
ISC Description / Description
Entity Business (item, item)
Shame Man (Person, Group)
Here Location (Location)
Unum Numbers (counting, date, etc.)

This is an example, where we expect as a response to a person's name:

Q (Text): “Who wrote the Iliad?”
A (Label): “Hum”

Usual as we start by installing libraries.

!pip install datasets==3.6.0
!pip install -U spacy[transformers]

We now need to load the data to correct the data.

Reference spacy.blank("en") We can build a Empty pipe For the English. Does not include any elements (such as Tagger or Passer) ,. It is a cold and ready to change unripe text to Doc things without uploading a full language model as we do en_core_web_sm.

DocBin Spaces special category that keeps the best most Doc things in a binary format. This is the way Spacy do expect that Training Data Saved.

Ever been modified and saved as .spacy Files, this can be forwarded directly spacy trainFastest than using Jumps JSON or text files.

Now now this article to prepare for the train and Dev Dataset should understand well.

from datasets import load_dataset
import spacy
from spacy.tokens import DocBin

# Load TREC dataset
dataset = load_dataset("trec")

# Get label names (e.g., ["DESC", "ENTY", "ABBR", ...])
label_names = dataset["train"].features["coarse_label"].names

# Create a blank English pipeline (no components yet)
nlp = spacy.blank("en")

# Convert Hugging Face examples into spaCy Docs and save as .spacy file
def convert_to_spacy(split, filename):
    doc_bin = DocBin()
    for example in split:
        text = example["text"]
        label = label_names[example["coarse_label"]]
        cats = {name: 0.0 for name in label_names}
        cats[label] = 1.0
        doc = nlp.make_doc(text)
        doc.cats = cats
        doc_bin.add(doc)
    doc_bin.to_disk(filename)

convert_to_spacy(dataset["train"], "train.spacy")
convert_to_spacy(dataset["test"], "dev.spacy")

We will make a Firther States Roberta in this data using the SAPCY CLI command. The command is waiting for a CONFG.CFG The file when describing the type of training, the model we use, the value of Epohchs etc.

Here is the preparation file I used my training controls.

[paths]
train = ./train.spacy
dev = ./dev.spacy
vectors = null
init_tok2vec = null

[system]
gpu_allocator = "pytorch"
seed = 42

[nlp]
lang = "en"
pipeline = ["transformer", "textcat"]
batch_size = 32

[components]

[components.transformer]
factory = "transformer"

[components.transformer.model]
@architectures = "spacy-transformers.TransformerModel.v3"
name = "roberta-base"
tokenizer_config = {"use_fast": true}
transformer_config = {}
mixed_precision = false
grad_scaler_config = {}

[components.transformer.model.get_spans]
@span_getters = "spacy-transformers.strided_spans.v1"
window = 128
stride = 96

[components.textcat]
factory = "textcat"
scorer = {"@scorers": "spacy.textcat_scorer.v2"}
threshold = 0.5

[components.textcat.model]
@architectures = "spacy.TextCatEnsemble.v2"
nO = null

[components.textcat.model.linear_model]
@architectures = "spacy.TextCatBOW.v3"
ngram_size = 1
no_output_layer = true
exclusive_classes = true
length = 262144

[components.textcat.model.tok2vec]
@architectures = "spacy-transformers.TransformerListener.v1"
upstream = "transformer"
pooling = {"@layers": "reduce_mean.v1"}
grad_factor = 1.0

[corpora]

[corpora.train]
@readers = "spacy.Corpus.v1"
path = ${paths.train}

[corpora.dev]
@readers = "spacy.Corpus.v1"
path = ${paths.dev}

[training]
train_corpus = "corpora.train"
dev_corpus = "corpora.dev"
seed = ${system.seed}
gpu_allocator = ${system.gpu_allocator}
dropout = 0.1
accumulate_gradient = 1
patience = 1600
max_epochs = 10
max_steps = 2000
eval_frequency = 100
frozen_components = []
annotating_components = []

[training.optimizer]
@optimizers = "Adam.v1"
learn_rate = 0.00005
L2 = 0.01
grad_clip = 1.0
use_averages = false
eps = 1e-08
beta1 = 0.9
beta2 = 0.999
L2_is_weight_decay = true

[training.batcher]
@batchers = "spacy.batch_by_words.v1"
discard_oversize = false
tolerance = 0.2

[training.batcher.size]
@schedules = "compounding.v1"
start = 256
stop = 2048
compound = 1.001

[training.logger]
@loggers = "spacy.ConsoleLogger.v1"
progress_bar = true

[training.score_weights]
cats_score = 1.0

[initialize]
vectors = ${paths.vectors}
init_tok2vec = ${paths.init_tok2vec}
vocab_data = null
lookups = null

[initialize.components]
[initialize.tokenizer]

Make sure you have a GPU you have and launch a CLI CLI command!

python —m spacy train config.cfg --output ./output --gpu-id 0

You will see the first training and you can view the loss of the Tektagorizer.

Just to clarify, we are to prepare a game Here TextCatseGizer text The part, the main network of a network of network that receives the presentation of the document and is learning to predict the relevant label.

But and we are Robertan is well prepared during this training. That means Roberta's instruments are renewed using TEC Dataset, so it learns how to represent the most helpful logistical questions in the separation.

As soon as the model is trained and stored, we can use it by synchronization!

import spacy

nlp = spacy.load("output/model-best")

doc = nlp("What is the capital of Italy?")
print(doc.cats)

Outgoing should be the same as the following

{'LOC': 0.98, 'HUM': 0.01, 'NUM': 0.0, …}

The last thoughts

Repeat, this post to see you should: Use digging dataset face with sparrately

  • Convert data separation data .spacy format
  • Prepare a full pipeline using Roberta and textcat
  • Train and test your model using sparring CLI

This method applies to any short text classification, emails, support tickets, product reviews, FAQs, or even Chatbot purposes.

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button