Guide to install independent AI code for AI a little to get a working data and learning

nimda October 11, 2025

0 9 6 minutes read

Guide to install independent AI code for AI a little to get a working data and learning

In this lesson, we examine the ability to guide them by means of ai framework. We start by building SIMCLR model to study a logical picture of the labels without labels, and then manifest the embedding using Map and T-Sne. We then come in to CoCeset selections to convert data tactfully, to imitate the effective functioning of work, and finally check the benefits of learning by checking Linear Probe. Through all the Guide, we work with step by step on Google Colab, training, logical, and comparing the CoreSet-based sample and how to understand the data functionality and functional function. Look Full codes here.

!pip uninstall -y numpy
!pip install numpy==1.26.4
!pip install -q lightly torch torchvision matplotlib scikit-learn umap-learn


import torch
import torch.nn as nn
import torchvision
from torch.utils.data import DataLoader, Subset
from torchvision import transforms
import numpy as np
import matplotlib.pyplot as plt
from sklearn.manifold import TSNE
from sklearn.neighbors import NearestNeighbors
import umap


from lightly.loss import NTXentLoss
from lightly.models.modules import SimCLRProjectionHead
from lightly.transforms import SimCLRTransform
from lightly.data import LightlyDataset


print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")

We start by setting the environment, ensures compliance with the nunpy version and installing the key libraries such as Purchly, Pytorch and Map. We have introduced all the necessary modules for construction, training, and visualization our high quality study model, ensuring the Pytorch and Cuffa ready to accelerate GPU acceleration. Look Full codes here.

class SimCLRModel(nn.Module):
   """SimCLR model with ResNet backbone"""
   def __init__(self, backbone, hidden_dim=512, out_dim=128):
       super().__init__()
       self.backbone = backbone
       self.backbone.fc = nn.Identity()
       self.projection_head = SimCLRProjectionHead(
           input_dim=512, hidden_dim=hidden_dim, output_dim=out_dim
       )
  
   def forward(self, x):
       features = self.backbone(x).flatten(start_dim=1)
       z = self.projection_head(features)
       return z
  
   def extract_features(self, x):
       """Extract backbone features without projection"""
       with torch.no_grad():
           return self.backbone(x).flatten(start_dim=1)

It describes our SIMClrmodel, using the Refnet spine to read visual submissions without labels. We remove the separation headache and add a guess headlines into the map characteristics into a separate embarking space. How to remove the model_feastures to allow us to get a raw feature directly from the backbone of the lower analysis. Look Full codes here.

def load_dataset(train=True):
   """Load CIFAR-10 dataset"""
   ssl_transform = SimCLRTransform(input_size=32, cj_prob=0.8)
  
   eval_transform = transforms.Compose([
       transforms.ToTensor(),
       transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010))
   ])
  
   base_dataset = torchvision.datasets.CIFAR10(
       root="./data", train=train, download=True
   )
  
   class SSLDataset(torch.utils.data.Dataset):
       def __init__(self, dataset, transform):
           self.dataset = dataset
           self.transform = transform
      
       def __len__(self):
           return len(self.dataset)
      
       def __getitem__(self, idx):
           img, label = self.dataset[idx]
           return self.transform(img), label
  
   ssl_dataset = SSLDataset(base_dataset, ssl_transform)
  
   eval_dataset = torchvision.datasets.CIFAR10(
       root="./data", train=train, download=True, transform=eval_transform
   )
  
   return ssl_dataset, eval_dataset

In this step, we upload a Cifar-10 dataset and use a different change of paid categories and inspection. We create a large SSLDataset category that makes most of the practical views of each photo of different learning, while the test dataset uses ordinary pics of low tasks. This setup helps the model to learn strong introductions to the visual changes. Look Full codes here.

def train_ssl_model(model, dataloader, epochs=5, device="cuda"):
   """Train SimCLR model"""
   model.to(device)
   criterion = NTXentLoss(temperature=0.5)
   optimizer = torch.optim.SGD(model.parameters(), lr=0.06, momentum=0.9, weight_decay=5e-4)
  
   print("n=== Self-Supervised Training ===")
   for epoch in range(epochs):
       model.train()
       total_loss = 0
       for batch_idx, batch in enumerate(dataloader):
           views = batch[0] 
           view1, view2 = views[0].to(device), views[1].to(device)
          
           z1 = model(view1)
           z2 = model(view2)
           loss = criterion(z1, z2)
          
           optimizer.zero_grad()
           loss.backward()
           optimizer.step()
          
           total_loss += loss.item()
          
           if batch_idx % 50 == 0:
               print(f"Epoch {epoch+1}/{epochs} | Batch {batch_idx} | Loss: {loss.item():.4f}")
      
       avg_loss = total_loss / len(dataloader)
       print(f"Epoch {epoch+1} Complete | Avg Loss: {avg_loss:.4f}")
  
   return model

Here, we train our SIMCLR model in the way of the way of using the unique NT-XENT loss, which promotes similar introductions of the transcripts we see in the same picture. We prepare a model with the Stochastic Gradient Face (SGD) and follow the loss of epochs to monitor the progress of learning. This section teaches a model to remove material characteristics without leaning on the installed data. Look Full codes here.

def generate_embeddings(model, dataset, device="cuda", batch_size=256):
   """Generate embeddings for the entire dataset"""
   model.eval()
   model.to(device)
  
   dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=False, num_workers=2)
  
   embeddings = []
   labels = []
  
   print("n=== Generating Embeddings ===")
   with torch.no_grad():
       for images, targets in dataloader:
           images = images.to(device)
           features = model.extract_features(images)
           embeddings.append(features.cpu().numpy())
           labels.append(targets.numpy())
  
   embeddings = np.vstack(embeddings)
   labels = np.concatenate(labels)
  
   print(f"Generated {embeddings.shape[0]} embeddings with dimension {embeddings.shape[1]}")
   return embeddings, labels


def visualize_embeddings(embeddings, labels, method='umap', n_samples=5000):
   """Visualize embeddings using UMAP or t-SNE"""
   print(f"n=== Visualizing Embeddings with {method.upper()} ===")
  
   if len(embeddings) > n_samples:
       indices = np.random.choice(len(embeddings), n_samples, replace=False)
       embeddings = embeddings[indices]
       labels = labels[indices]
  
   if method == 'umap':
       reducer = umap.UMAP(n_neighbors=15, min_dist=0.1, metric="cosine")
   else:
       reducer = TSNE(n_components=2, perplexity=30, metric="cosine")
  
   embeddings_2d = reducer.fit_transform(embeddings)
  
   plt.figure(figsize=(12, 10))
   scatter = plt.scatter(embeddings_2d[:, 0], embeddings_2d[:, 1],
                         c=labels, cmap='tab10', s=5, alpha=0.6)
   plt.colorbar(scatter)
   plt.title(f'CIFAR-10 Embeddings ({method.upper()})')
   plt.xlabel('Component 1')
   plt.ylabel('Component 2')
   plt.tight_layout()
   plt.savefig(f'embeddings_{method}.png', dpi=150)
   print(f"Saved visualization to embeddings_{method}.png")
   plt.show()


def select_coreset(embeddings, labels, budget=1000, method='diversity'):
   """
   Select a coreset using different strategies:
   - diversity: Maximum diversity using k-center greedy
   - balanced: Class-balanced selection
   """
   print(f"n=== Coreset Selection ({method}) ===")
  
   if method == 'balanced':
       selected_indices = []
       n_classes = len(np.unique(labels))
       per_class = budget // n_classes
      
       for cls in range(n_classes):
           cls_indices = np.where(labels == cls)[0]
           selected = np.random.choice(cls_indices, min(per_class, len(cls_indices)), replace=False)
           selected_indices.extend(selected)
      
       return np.array(selected_indices)
  
   elif method == 'diversity':
       selected_indices = []
       remaining_indices = set(range(len(embeddings)))
      
       first_idx = np.random.randint(len(embeddings))
       selected_indices.append(first_idx)
       remaining_indices.remove(first_idx)
      
       for _ in range(budget - 1):
           if not remaining_indices:
               break
          
           remaining = list(remaining_indices)
           selected_emb = embeddings[selected_indices]
           remaining_emb = embeddings[remaining]
          
           distances = np.min(
               np.linalg.norm(remaining_emb[:, None] - selected_emb, axis=2), axis=1
           )
          
           max_dist_idx = np.argmax(distances)
           selected_idx = remaining[max_dist_idx]
           selected_indices.append(selected_idx)
           remaining_indices.remove(selected_idx)
      
       print(f"Selected {len(selected_indices)} samples")
       return np.array(selected_indices)

We remove the high-quality feature of the feature from our trained spine, and use the labels, and use them in 2D using Map or T-Sne to see seeing the formation of the collection. Next, we make data using a coreSet selector, whether corpet-classits or Dreams-Center-Center (K-Center Ehaha), prioritizing, not required training under training. The Pipeline helps us to see what the model is learning and choose the most important. Look Full codes here.

def evaluate_linear_probe(model, train_subset, test_dataset, device="cuda"):
   """Train linear classifier on frozen features"""
   model.eval()
  
   train_loader = DataLoader(train_subset, batch_size=128, shuffle=True, num_workers=2)
   test_loader = DataLoader(test_dataset, batch_size=256, shuffle=False, num_workers=2)
  
   classifier = nn.Linear(512, 10).to(device)
   criterion = nn.CrossEntropyLoss()
   optimizer = torch.optim.Adam(classifier.parameters(), lr=0.001)
  
   for epoch in range(10):
       classifier.train()
       for images, targets in train_loader:
           images, targets = images.to(device), targets.to(device)
          
           with torch.no_grad():
               features = model.extract_features(images)
          
           outputs = classifier(features)
           loss = criterion(outputs, targets)
          
           optimizer.zero_grad()
           loss.backward()
           optimizer.step()
  
   classifier.eval()
   correct = 0
   total = 0
  
   with torch.no_grad():
       for images, targets in test_loader:
           images, targets = images.to(device), targets.to(device)
           features = model.extract_features(images)
           outputs = classifier(features)
           _, predicted = outputs.max(1)
           total += targets.size(0)
           correct += predicted.eq(targets).sum().item()
  
   accuracy = 100. * correct / total
   return accuracy


def main():
   device="cuda" if torch.cuda.is_available() else 'cpu'
   print(f"Using device: {device}")
  
   ssl_dataset, eval_dataset = load_dataset(train=True)
   _, test_dataset = load_dataset(train=False)
  
   ssl_subset = Subset(ssl_dataset, range(10000)) 
   ssl_loader = DataLoader(ssl_subset, batch_size=128, shuffle=True, num_workers=2, drop_last=True)
  
   backbone = torchvision.models.resnet18(pretrained=False)
   model = SimCLRModel(backbone)
   model = train_ssl_model(model, ssl_loader, epochs=5, device=device)
  
   eval_subset = Subset(eval_dataset, range(10000))
   embeddings, labels = generate_embeddings(model, eval_subset, device=device)
  
   visualize_embeddings(embeddings, labels, method='umap')
  
   coreset_indices = select_coreset(embeddings, labels, budget=1000, method='diversity')
   coreset_subset = Subset(eval_dataset, coreset_indices)
  
   print("n=== Active Learning Evaluation ===")
   coreset_acc = evaluate_linear_probe(model, coreset_subset, test_dataset, device=device)
   print(f"Coreset Accuracy (1000 samples): {coreset_acc:.2f}%")
  
   random_indices = np.random.choice(len(eval_subset), 1000, replace=False)
   random_subset = Subset(eval_dataset, random_indices)
   random_acc = evaluate_linear_probe(model, random_subset, test_dataset, device=device)
   print(f"Random Accuracy (1000 samples): {random_acc:.2f}%")
  
   print(f"nCoreset improvement: +{coreset_acc - random_acc:.2f}%")
  
   print("n=== Tutorial Complete! ===")
   print("Key takeaways:")
   print("1. Self-supervised learning creates meaningful representations without labels")
   print("2. Embeddings capture semantic similarity between images")
   print("3. Smart data selection (coreset) outperforms random sampling")
   print("4. Active learning reduces labeling costs while maintaining accuracy")


if __name__ == "__main__":
   main()

We release the spine and train the investigation into a light line to love our many features, and check the accuracy from the test set. The main pipeline, as if in SIMCLR, produced embalming, visualized, compared to various performance, thus measuring the value of Smart Curart.

In conclusion, we have seen that the regulation controlled guidance enables the presentation reading without the handwritten data and copy data improves a standard model for a few samples. By training SIMCRR model, stimulating, data, data, and effective learning assessment, we receive the final process of investigating today's performance. We conclude that by combining Intelligent Data is measured in study-reading presentations, we can create efficient and efficient models, laid a solid basis for organized machine learning applications.

Look Full codes here. Feel free to look our GITHUB page for tutorials, codes and letters of writing. Also, feel free to follow it Sane and don't forget to join ours 100K + ml subreddit Then sign up for Our newspaper. Wait! Do you with a telegram? Now you can join us with a telegram.

Asphazzaq is a Markteach Media Inc. According to a View Business and Developer, Asifi is committed to integrating a good social intelligence. His latest attempt is launched by the launch of the chemistrylife plan for an intelligence, MarktechPost, a devastating intimate practice of a machine learning and deep learning issues that are clearly and easily understood. The platform is adhering to more than two million moon visits, indicating its popularity between the audience.

Follow MarkteachPost: We have added like a favorite source to Google.

Source link

nimda October 11, 2025

0 9 6 minutes read