Good lllama-2 Back-2b Chat Chat: Using Qlora, Sorttrainer, and Gradient Checkpoint in Alpacaca-14K Database

nimda February 8, 2025

0 9 6 minutes read

Good lllama-2 Back-2b Chat Chat: Using Qlora, Sorttrainer, and Gradient Checkpoint in Alpacaca-14K Database

In this lesson, we show how good the LLAMA-2 BOD model of the Python generation uses advanced strategies such as the qlora, well-looking for gradient, and good directions with SFTTRAINER. NEVENLAG ALPACH-14K DATASET, we are setting the nature of Lora, and uses Lora parameters, and uses memory plans to train model by high quality python. This step-by-step guide is meant for doctors who want to combine the power of llms with a small brief computational overhead.

!pip install -q accelerate
!pip install -q peft
!pip install -q transformers
!pip install -q trl

First, enter the required libraries of our project. It includes accelerating, peft, transformers, and the trl from the Python Package Index. The Q flag (silent mode) ends up a minimum level.

import os
from datasets import load_dataset
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    HfArgumentParser,
    TrainingArguments,
    pipeline,
    logging,
)
from peft import LoraConfig, PeftModel
from trl import SFTTrainer

Import our essential modules for training setup. Including data uploading services, model / Tokenzer, training issues, medications, Lora Configuration, and SFTRAIDER.

# The model to train from the Hugging Face hub
model_name = "NousResearch/llama-2-7b-chat-hf"
# The instruction dataset to use
dataset_name = "user/minipython-Alpaca-14k"


# Fine-tuned model name
new_model = "/kaggle/working/llama-2-7b-codeAlpaca"

We specify the basic model from Hugging Hugging, Tutorial Data, and a new model name.

# QLoRA parameters
# LoRA attention dimension
lora_r = 64
# Alpha parameter for LoRA scaling
lora_alpha = 16
# Dropout probability for LoRA layers
lora_dropout = 0.1

Describe the lora parameters with our model. `Lora_r` Sets the Lora Size,` Lora_ALPA` Lora scales updates, and `lor_dropout` Controls controvs opportunities to quit.

# TrainingArguments parameters


# Output directory where the model predictions and checkpoints will be stored
output_dir = "/kaggle/working/llama-2-7b-codeAlpaca"
# Number of training epochs
num_train_epochs = 1
# Enable fp16 training (set to True for mixed precision training)
fp16 = True
# Batch size per GPU for training
per_device_train_batch_size = 8
# Batch size per GPU for evaluation
per_device_eval_batch_size = 8
# Number of update steps to accumulate the gradients for
gradient_accumulation_steps = 2
# Enable gradient checkpointing
gradient_checkpointing = True
# Maximum gradient norm (gradient clipping)
max_grad_norm = 0.3
# Initial learning rate (AdamW optimizer)
learning_rate = 2e-4
# Weight decay to apply to all layers except bias/LayerNorm weights
weight_decay = 0.001
# Optimizer to use
optim = "adamw_torch"
# Learning rate schedule
lr_scheduler_type = "constant"
# Group sequences into batches with the same length
# Saves memory and speeds up training considerably
group_by_length = True
# Ratio of steps for a linear warmup
warmup_ratio = 0.03
# Save checkpoint every X updates steps
save_steps = 100
# Log every X updates steps
logging_steps = 10

These parameters prepare a training process. Including exit methods, the amount of recommendations, accuracy (FP16), batch sizes, the collection of qualifications, and evaluation. Additional settings such as learning level, optimizer, and for assistance assistance helps good training performance. Warmth and login is governed by how the model starts training and how we view progress.

import torch
print("PyTorch Version:", torch.__version__)
print("CUDA Version:", torch.version.cuda)

Import the pytroch and print both the pytro compound and compatible version of the Cuda.

This command shows the GPU information, including the driver's version, a customer version, and current expenditure of GPU.

# SFT parameters


# Maximum sequence length to use
max_seq_length = None
# Pack multiple short examples in the same input sequence to increase efficiency
packing = False
# Load the entire model on the GPU 0
device_map = {"": 0}

Describe SFT parameters, such as maximum sequence, even if packing many examples, and lists the map all model in GPU 0.

# SFT parameters


# Maximum sequence length to use
max_seq_length = None
# Pack multiple short examples in the same input sequence to increase efficiency
packing = False
# Load dataset
dataset = load_dataset(dataset_name, split="train")


# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"
# Load base model with 8-bit quantization
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto",
)


# Prepare model for training
model.gradient_checkpointing_enable()
model.enable_input_require_grads()

Set some Soft parameters and upload our data and token. We prepare for padding tokens of the Tokenzer and load the basic model with 8-bit capacity. Finally, we empower the gradient and make sure that the model requires professional training.

from peft import get_peft_model

Import work `Get_peft_model`, efficient with parameter-tuning (PEFT) in our basic model.

# Load LoRA configuration
peft_config = LoraConfig(
    lora_alpha=lora_alpha,
    lora_dropout=lora_dropout,
    r=lora_r,
    bias="none",
    task_type="CAUSAL_LM",
)


# Apply LoRA to the model
model = get_peft_model(model, peft_config)
# Set training parameters
training_arguments = TrainingArguments(
    output_dir=output_dir,
    num_train_epochs=num_train_epochs,
    per_device_train_batch_size=per_device_train_batch_size,
    gradient_accumulation_steps=gradient_accumulation_steps,
    optim=optim,
    save_steps=save_steps,
    logging_steps=logging_steps,
    learning_rate=learning_rate,
    weight_decay=weight_decay,
    fp16=fp16,
    max_grad_norm=max_grad_norm,
    warmup_ratio=warmup_ratio,
    group_by_length=True,
    lr_scheduler_type=lr_scheduler_type,
)
# Set supervised fine-tuning parameters
trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    tokenizer=tokenizer,
    args=training_arguments,
    packing=packing,
)

Prepare and use Lora in our model using `Loraconfig` and` Get_peft_model`. We are `old training` Exemplary training, which specifies the EPOch calculation, batch sizes and good use settings. Finally, we put the `sfftraer`, transferring the model, dataset, Tokenzer, and opposition to training.

# Train model
trainer.train()
# Save trained model
trainer.model.save_pretrained(new_model)

Start the best planning process (`train.Train () and save the trained Lora model in the specified identification.

# Run text generation pipeline with the fine-tuned model
prompt = "How can I write a Python program that calculates the mean, standard deviation, and coefficient of variation of a dataset from a CSV file?"
pipe = pipeline(task="text-generation", model=trainer.model, tokenizer=tokenizer, max_length=400)
result = pipe(f"[INST] {prompt} [/INST]")
print(result[0]['generated_text'])

Create a text generation pipeline using our well organized model and token. After that, we offer quickly, produce text using a pipe, and print the result.

from kaggle_secrets import UserSecretsClient
user_secrets = UserSecretsClient()
secret_value_0 = user_secrets.get_secret("HF_TOKEN")

Access Kaggle secrets to find a saved face token (`hf_token`). This note is used for verification of the face of face.

# Empty VRAM
# del model
# del pipe
# del trainer
# del dataset
del tokenizer
import gc
gc.collect()
gc.collect()
torch.cuda.empty_cache()

The above Snippet indicates how to use GPU memory by removing the trusts and cleaning the caches. We remove the Tokenzer, run the garbage collection, and we have nothing to cuda cache to reduce the use of the Vram.

import torch


# Check the number of GPUs available
num_gpus = torch.cuda.device_count()
print(f"Number of GPUs available: {num_gpus}")


# Check if CUDA device 1 is available
if num_gpus > 1:
    print("cuda:1 is available.")
else:
    print("cuda:1 is not available.")

We import pytroch and check the value of the GPUS found. Then, we print the number and report on situations that GPU has a ID 1 is available.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel


# Specify the device ID for your desired GPU (e.g., 0 for the first GPU, 1 for the second GPU)
device_id = 1  # Change this based on your available GPUs
device = f"cuda:{device_id}"
# Load the base model on the specified GPU
base_model = AutoModelForCausalLM.from_pretrained(
    model_name,
    low_cpu_mem_usage=True,
    return_dict=True,
    torch_dtype=torch.float16,
    device_map="auto",  # Use auto to load on the available device
)
# Load the LoRA weights
lora_model = PeftModel.from_pretrained(base_model, new_model)
# Move LoRA model to the specified GPU
lora_model.to(device)
# Merge the LoRA weights with the base model weights
model = lora_model.merge_and_unload()
# Ensure the merged model is on the correct device
model.to(device)
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

Select a GPU device (a device_id 1) and upload the basic model with specified accuracy and memory memory. Then, Load and integrate lora instruments in the basic model, confirm the combined model is submitted to the specified GPU. Finally, upload the token and prepare it with appropriate parding settings.

At the conclusion, following this lesson, successfully edited the model of the lllama-2 7B chat model to be strong in Python code authentication. Matching Qlora, a stem, and the SFTTRAIDER indicates a practical way to manage the app constraints while reaching high performance.

Download Colab Notebook HERE. All credit for this study goes to research for this project. Also, don't forget to follow Sane and join ours Telegraph station including LinkedIn Grtopic. Don't forget to join ours 75k + ml subreddit.

🚨 MarktechPost is shouting for companies / initializing / groups to cooperate with the coming magazines of AI the following 'Source Ai in production' and 'and' Agentic Ai '.

Asphazzaq is a Markteach Media Inc. According to a View Business and Developer, Asifi is committed to integrating a good social intelligence. His latest attempt is launched by the launch of the chemistrylife plan for an intelligence, MarktechPost, a devastating intimate practice of a machine learning and deep learning issues that are clearly and easily understood. The platform is adhering to more than two million moon visits, indicating its popularity between the audience.

✅ [Recommended] Join Our Telegraph Channel

Source link

nimda February 8, 2025

0 9 6 minutes read