Machine Learning

Use Pytorch to easily access your GPU

They are lucky enough to reach a system with a system with the Nviditian Graphical Unit (GPU). Did you know that there is an unable method of using your GPU skills using the Python Library aimed at and used in machine applications (ML)?

Don't worry if you are not quick to accelerate Ins and ml out, because we won't use it in this article. Instead, I will show you how to use the eyhostal library to access and use your GPU power. We will compare the running times of Python plans using a favorite library Nunpy, Running in CPU, with the same code using pytroch in GPU.

Before continuing, let's quickly retrieve what is GPU and the pytro which is the best.

What is GPU?

The GPU special electronic chip is designed to launch quickly and convert memory to accelerate the creation of images in the budget buffer aimed at the display device. Its use as a speedy fraudulent fraudulent device based on its ability to make many statistics at the same time, and still be used for that purpose.

However, the GPU recently is more important for a machine learning, a major model training and development. Their presence of the presence of making compounds are understandable to the relevant employees in these fields, as they use complication statistics and simulation.

What is Pytrch?

Pytorch is the open machine library developed by Facebook's Ai Research Lab (Fair). It is widely used to process the environmental languages ​​and computer programs. Two main reasons that pytorch can be used for gupo performance,

  • One of the Pytorch's Pytorch's Pytorch (s) Customers are the same as Arrays and matrics in other planning languages, but are designed to run on GPU.
  • Pytorch has a Cuffle support. Pytorch meets outside the seams, a compatible computer platform with unedvia model for the General Computing in its GPUS. This allows pytorch to reach the GPU hardware, to accelerate number prices. CUDA will allow developers to use PYTORCH to write software fully using GPU faster.

In short, the Iyttorch support for the gupo with a Cuda and its ability to deceive tensors makes it a wonderful tool for speedy phenomenic tasks.

As we will show later, you don't have In order to use pytorch to develop a machine reading models or training large models of language.

In all other topics, we will set up our environmental environment, using a few examples where we will compare some heavy pytorch apps with equal functioning and if any, the performance difference we receive.

The requirements are before

You need nvidia GPU in your system. To check your GPU, remove the next command to your system immediately. I use Windows Subsystem of Linux (WSL).

$ nvidia-smi

>>
(base) PS C:Usersthoma> nvidia-smi
Fri Mar 22 11:41:34 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 551.61                 Driver Version: 551.61         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                     TCC/WDDM  | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 4070 Ti   WDDM  |   00000000:01:00.0  On |                  N/A |
| 32%   24C    P8              9W /  285W |     843MiB /  12282MiB |      1%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A      1268    C+G   ...tilityHPSystemEventUtilityHost.exe      N/A      |
|    0   N/A  N/A      2204    C+G   ...ekyb3d8bbwePhoneExperienceHost.exe      N/A      |
|    0   N/A  N/A      3904    C+G   ...calMicrosoftOneDriveOneDrive.exe      N/A      |
|    0   N/A  N/A      7068    C+G   ...CBS_cw5n
etc ..

If that command is not recognized and sure you have a GPU, it probably means you lost Lvidia driver. Just follow some commandments in this article, and should be included as part of the process.

While pytorch installation packages can include cuda information libraries, your plan must replace the relevant drivers of Benvidia GPU. Those drivers are required in your application to communicate with the Graphics Procering Unit Hardware (GPU). Cuda toolkit inserts drivers, but when using Pytorch's Anderd Cuda, you only need to ensure that your GPU drivers are currently located.

Click this link to go to the Nvidia website and install recent drivers associated with your system and the descriptions of GPU.

Setting up our development environment

As a wonderful practice, we should place a different environment for the development of each project. I use colla, but use any correct way you.

If you want to decrease the collar route and you don't have, you must include a minicondonda (recommended) or anacondo first.

Please note that, during writing, Pytorch is currently officially officially officially legally official types of Python 3.8 to 3.11.

#create our test environment
(base) $ conda create -n pytorch_test python=3.11 -y

Now work your new nature.

(base) $ conda activate pytorch_test

We now need to find the correct PYTORCH installation order. This will depend on your operating system, the selected programming language, your favorite package manager, and a compound type.

Fortunately, Pytorch provides a valid web display that makes this easy to set up. So, to get started, look at the pytro website at …

Click on Get Started Link next to the screen. From there, Scroll down slowly down until you see this,

Image from the iytrech website

Click on each box in the correct location of your system and specs. As you do, you will see that the command to the Run this Command The output field is changing. When you have finished making your decisions, copy the final text of the instruction and type in the Command Window Prompt.

To me, this was: –

(pytorch_test) $ conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia -y

We will install JYSTER, PANDAS, and matplotlib to help us use our Python code in the booklet.

(pytroch_test) $ conda install pandas matplotlib jupyter -y

Now type jupyter notebook in your order. You should see the Jusster brochure open in your browser. If that doesn't automatically, you may be able to see the information screefil after jupyter notebook command.

Next to the floor, there will be a URL to copy and paste on your browser to start the Jobyter booklet.

Your URL will be different with mine, but you should look like such a thing: –

To check our setup

The first thing we will do is test our setup. Please enter the following in Jobyter Cell and run it.

import torch
x = torch.rand(5, 3)
print(x)

You should see the same effect on the following.

tensor([[0.3715, 0.5503, 0.5783],
        [0.8638, 0.5206, 0.8439],
        [0.4664, 0.0557, 0.6280],
        [0.5704, 0.0322, 0.6053],
        [0.3416, 0.4090, 0.6366]])

Additionally, to check that your GPU driver and CDA are enabled and available with theytorch, run the following instructions:

import torch
torch.cuda.is_available()

This should be issued True If all is okay.

If everything is right, we can continue our examples. If not, go back to check your installations processes.

NB Designations below, I ran one of Nemper and Nmytorch repeatedly processing and so many times took the best time. This love pytorch works somewhat as there is a little extreme in the first pytro run but, in all, I think it's a good comparison.

Example 1 – Simple statistics performance.

In this example, we placed two large, similar edges and performed the easy additions to each item.

import numpy as np
import torch as pt

from timeit import default_timer as timer   
  

#func1 will run on the CPU   
def func1(a):                                 
    a+= 1  
    
#func2 will run on the GPU
def func2(a):                                 
    a+= 2

if __name__=="__main__": 
    n1 = 300000000                          
    a1 = np.ones(n1, dtype = np.float64) 

    # had to make this array much smaller than
    # the others due to slow loop processing on the GPU
    n2 = 300000000                     
    a2 = pt.ones(n2,dtype=pt.float64)

    start = timer() 
    func1(a1) 
    print("Timing with CPU:numpy", timer()-start)     
      
    start = timer() 
    func2(a2) 
    #wait for all calcs on the GPU to complete
    pt.cuda.synchronize()
    print("Timing with GPU:pytorch", timer()-start) 
    print()

    print("a1 = ",a1)
    print("a2 = ",a2)
Timing with CPU:numpy 0.1334826999955112
Timing with GPU:pytorch 0.10177790001034737

a1 =  [2. 2. 2. ... 2. 2. 2.]
a2 =  tensor([3., 3., 3.,  ..., 3., 3., 3.], dtype=torch.float64)

We see a little progress when using Pytorch over Numpy, but we miss one important point. We have never used GPU because our pytro Tensor data is still in the CPU indigenous.

Submitting information to GPU memory, we need to add an device='cuda' Point when you create an abdomen. Let's do and see what makes a difference.

# Same code as above except 
# to get the array data onto the GPU memory
# we changed

a2 = pt.ones(n2,dtype=pt.float64)

# to

a2 = pt.ones(n2,dtype=pt.float64,device='cuda')

After reworking with the change we receive,

Timing with CPU:numpy 0.12852740001108032
Timing with GPU:pytorch 0.011292399998637848

a1 =  [2. 2. 2. ... 2. 2. 2.]
a2 =  tensor([3., 3., 3.,  ..., 3., 3., 3.], device='cuda:0', dtype=torch.float64)

That's like it, too much than 10x.

Example 2 – a gradual carved performance.

As a result of this example, we will increase various various materials using built-in court The operation available in the libraries of pytroch and nunpy. Each list will be 10000 x 10000 and consists of random point numbers between 1 and 100.

# NUMPY first
import numpy as np
from timeit import default_timer as timer

# Set the seed for reproducibility
np.random.seed(0)
# Generate two 10000x10000 arrays of random floating point numbers between 1 and 100
A = np.random.uniform(low=1.0, high=100.0, size=(10000, 10000)).astype(np.float32)
B = np.random.uniform(low=1.0, high=100.0, size=(10000, 10000)).astype(np.float32)
# Perform matrix multiplication
start = timer() 
C = np.matmul(A, B)

# Due to the large size of the matrices, it's not practical to print them entirely.
# Instead, we print a small portion to verify.
print("A small portion of the result matrix:n", C[:5, :5])
print("Without GPU:", timer()-start)
A small portion of the result matrix:
 [[25461280. 25168352. 25212526. 25303304. 25277884.]
 [25114760. 25197558. 25340074. 25341850. 25373122.]
 [25381820. 25326522. 25438612. 25596932. 25538602.]
 [25317282. 25223540. 25272242. 25551428. 25467986.]
 [25327290. 25527838. 25499606. 25657218. 25527856.]]

Without GPU: 1.4450852000009036

Now with the pytorch version.

import torch
from timeit import default_timer as timer

# Set the seed for reproducibility
torch.manual_seed(0)

# Use the GPU
device = 'cuda'

# Generate two 10000x10000 tensors of random floating point 
# numbers between 1 and 100 and move them to the GPU
#
A = torch.FloatTensor(10000, 10000).uniform_(1, 100).to(device)
B = torch.FloatTensor(10000, 10000).uniform_(1, 100).to(device)

# Perform matrix multiplication
start = timer()
C = torch.matmul(A, B)

# Wait for all current GPU operations to complete (synchronize)
torch.cuda.synchronize() 

# Due to the large size of the matrices, it's not practical to print them entirely.
# Instead, we print a small portion to verify.
print("A small portion of the result matrix:n", C[:5, :5])
print("With GPU:", timer() - start)
A small portion of the result matrix:
 [[25145748. 25495480. 25376196. 25446946. 25646938.]
 [25357524. 25678558. 25675806. 25459324. 25619908.]
 [25533988. 25632858. 25657696. 25616978. 25901294.]
 [25159630. 25230138. 25450480. 25221246. 25589418.]
 [24800246. 25145700. 25103040. 25012414. 25465890.]]

With GPU: 0.07081239999388345

Pytorch Run was better 20 at this time than Nunpy Run. Good stuff.

Example 3 – Consolidating CPU and GPU code.

Sometimes, not all your considerations can be made with GPU. The daily use case for this click data. Certainly, you can deceive your data using a GPU, but the next step is to see what your last data looks for conspiracy.

You cannot save data if it resides in the GPU memory, so you have to remove it from the CPU memory before calling your editorial activities. Is it worth more than submitting large chunks of data from GPU on the CPU? Let's look at.

In this example, we will resolve the Polar Equation of θ prices between 0 and 2π (X, y) link the terms and organize the result.

Don't deeply tendency to figures. It's just equation that, when it is converted to use the X, y links the program and resolved, looks good when organized.

Until a few million X and Y rates, part can solve this from milliseconds, so make it more fun, we will use 100 million links (X, Y).

Here is the nunpy code first.

%%time
import numpy as np
import matplotlib.pyplot as plt
from time import time as timer

start = timer()

# create an array of 100M thetas between 0 and 2pi
theta = np.linspace(0, 2*np.pi, 100000000)

# our original polar formula
r = 1 + 3/4 * np.sin(3*theta)

# calculate the equivalent x and y's coordinates 
# for each theta
x = r * np.cos(theta)
y = r * np.sin(theta)

# see how long the calc part took
print("Finished with calcs ", timer()-start)

# Now plot out the data
start = timer()
plt.plot(x,y)

# see how long the plotting part took
print("Finished with plot ", timer()-start)

Here's the outgoing. Did you think in advance that it would look like this? I'm sure I won't have it!

Now, let's see how the use of the same pytorch looks and how much the speed we get.

%%time
import torch as pt
import matplotlib.pyplot as plt
from time import time as timer

# Make sure PyTorch is using the GPU
device = 'cuda'

# Start the timer
start = timer()

# Creating the theta tensor on the GPU
theta = pt.linspace(0, 2 * pt.pi, 100000000, device=device)

# Calculating r, x, and y using PyTorch operations on the GPU
r = 1 + 3/4 * pt.sin(3 * theta)
x = r * pt.cos(theta)
y = r * pt.sin(theta)

# Moving the result back to CPU for plotting
x_cpu = x.cpu().numpy()
y_cpu = y.cpu().numpy()

pt.cuda.synchronize()
print("Finished with calcs", timer() - start)

# Plotting
start = timer()
plt.plot(x_cpu, y_cpu)
plt.show()

print("Finished with plot", timer() - start)

And our outturn and.

The countdown was over 10 counters. Data editing has taken the same time using both types of NEMVY, expected that data was in the CPU memory then, and GPU played another part in the process.

However, in all, we struck about 40% of the total, the best.

Summary

This article has shown how to show NVIIDI GPU's Noxy Learning Tasks (CPU) Based on GPU-prompts such as Prompting Employment GPU.

You do not need to make a machine reading to benefit from pytorch. If you can enter the Nvidia GPU, the pytorch provides simple and effective ways to speed up the speed of wide number – even the regular Python code.

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button