The implementation of the construction code of the transformation of transformation based on transformirmir predictions predictions ongoing prices from the text

nimda October 5, 2025

0 2 5 minutes read

The implementation of the construction code of the transformation of transformation based on transformirmir predictions predictions ongoing prices from the text

We will build a language model to postpone (RLM), a model that predicts continuous numbers of numbers in the chronological order of this codes. Instead of separating or producing the text, we focus on transformer-based training that learns more of the relationships hidden within the definition of environmental language. We start by producing synthetic data to-number, functions successfully, and train the unpleasant encoder for the language map in real strategies. At the end, we do not only understand how the RLMs can be implemented from the beginning but can imagine their learning performance and evaluate the common in invisible examples. Look Full codes here.

import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
import matplotlib.pyplot as plt
from collections import Counter
import re


torch.manual_seed(42)
np.random.seed(42)


print("🚀 Regression Language Model (RLM) Tutorial")
print("=" * 60)

We begin by taking in practice important libraries, such as pytorch, Innum, and matplotlib, to create and visualize our model of our language. We put random seed to ensure the reinstatement and start the environment, thus verifying the results that are not consistent each time the study is held. Look Full codes here.

def generate_synthetic_data(n_samples=2000):
   """Generate synthetic text-to-number regression data"""
  
   templates = [
       ("The temperature is {} degrees", lambda x: x),
       ("I rate this {} out of ten", lambda x: x),
       ("The price is {} dollars", lambda x: x),
       ("Confidence level: {}", lambda x: x / 100),
       ("Speed of {} kilometers per hour", lambda x: x / 10),
       ("{} percent complete", lambda x: x / 100),
       ("Scored {} points in the game", lambda x: x / 10),
       ("The distance is {} meters", lambda x: x),
   ]
  
   data = []
   for _ in range(n_samples):
       template, transform = templates[np.random.randint(len(templates))]
       value = np.random.uniform(0, 100)
       text = template.format(round(value, 1))
       target = transform(value)
       data.append((text, target))
  
   return data

We create a dataset for the process of nature-language phrases that are related to it. By using a variety of templates such as temperatures, ratings, and percentages, we ensure that the model learns a variety of books. This controlled setup helps us to imitate the logical functions of returning without depending on external detail. Look Full codes here.

class SimpleTokenizer:
   def __init__(self):
       self.word2idx = {"": 0, "": 1}
       self.idx2word = {0: "", 1: ""}
       self.vocab_size = 2
  
   def fit(self, texts):
       """Build vocabulary from texts"""
       words = []
       for text in texts:
           words.extend(re.findall(r'w+|[^ws]', text.lower()))
      
       word_counts = Counter(words)
       for word, _ in word_counts.most_common():
           if word not in self.word2idx:
               self.word2idx[word] = self.vocab_size
               self.idx2word[self.vocab_size] = word
               self.vocab_size += 1
  
   def encode(self, text, max_len=20):
       """Convert text to token indices"""
       words = re.findall(r'w+|[^ws]', text.lower())
       indices = [self.word2idx.get(w, 1) for w in words]
      
       if len(indices) < max_len:
           indices += [0] * (max_len - len(indices))
       else:
           indices = indices[:max_len]
      
       return indices

We are designing a simple tokenzer to convert mature text into model and model to process. Creates vocabulary from all unique names and maps in index, manage unknown words and automatically wrap. This step ensures that our text input is converted into a fixed, mechanical text. Look Full codes here.

class RLMDataset(Dataset):
   def __init__(self, data, tokenizer, max_len=20):
       self.data = data
       self.tokenizer = tokenizer
       self.max_len = max_len
  
   def __len__(self):
       return len(self.data)
  
   def __getitem__(self, idx):
       text, target = self.data[idx]
       tokens = self.tokenizer.encode(text, self.max_len)
       return torch.tensor(tokens), torch.tensor([target], dtype=torch.float32)


class RegressionLanguageModel(nn.Module):
   def __init__(self, vocab_size, embed_dim=128, num_heads=4, num_layers=2,
                dropout=0.1, max_len=20):
       super().__init__()
      
       self.token_embedding = nn.Embedding(vocab_size, embed_dim, padding_idx=0)
       self.position_embedding = nn.Embedding(max_len, embed_dim)
      
       encoder_layer = nn.TransformerEncoderLayer(
           d_model=embed_dim,
           nhead=num_heads,
           dim_feedforward=embed_dim * 4,
           dropout=dropout,
           batch_first=True
       )
       self.transformer = nn.TransformerEncoder(encoder_layer, num_layers=num_layers)
      
       self.fc1 = nn.Linear(embed_dim, 64)
       self.relu = nn.ReLU()
       self.dropout = nn.Dropout(dropout)
       self.fc2 = nn.Linear(64, 1)
      
       self.max_len = max_len
  
   def forward(self, x):
       batch_size, seq_len = x.shape
      
       positions = torch.arange(0, seq_len, device=x.device).unsqueeze(0).expand(batch_size, -1)
      
       token_embed = self.token_embedding(x)
       pos_embed = self.position_embedding(positions)
       embeddings = token_embed + pos_embed
      
       padding_mask = (x == 0)
      
       encoded = self.transformer(embeddings, src_key_padding_mask=padding_mask)
      
       mask_expanded = (~padding_mask).unsqueeze(-1).float()
       summed = (encoded * mask_expanded).sum(dim=1)
       pooled = summed / mask_expanded.sum(dim=1)
      
       x = self.fc1(pooled)
       x = self.relu(x)
       x = self.dropout(x)
       output = self.fc2(x)
      
       return output

We include our pairs of number written in PyTTorch Database, when we comply each sentence and return the crumbs ready to stare. We have built a transformer based rlm: token and the automatic embedding flowing with various encoder, reflecting unlimited tokens, and feeds the result of a small milp headset. In fact, we allow the encoder to learn the number of the number from the language, while the head forgets one continuous amount. Look Full codes here.

def train_rlm(model, train_loader, val_loader, epochs=15, lr=0.001):  
   criterion = nn.MSELoss()
   optimizer = optim.Adam(model.parameters(), lr=lr)
  
   train_losses, val_losses = [], []
  
   print(f"n📊 Training on {device}")
   print("-" * 60)
  
   for epoch in range(epochs):
       model.train()
       train_loss = 0
       for tokens, targets in train_loader:
           tokens, targets = tokens.to(device), targets.to(device)
          
           optimizer.zero_grad()
           outputs = model(tokens)
           loss = criterion(outputs, targets)
           loss.backward()
           optimizer.step()
          
           train_loss += loss.item()
      
       train_loss /= len(train_loader)
       train_losses.append(train_loss)
      
       model.eval()
       val_loss = 0
       with torch.no_grad():
           for tokens, targets in val_loader:
               tokens, targets = tokens.to(device), targets.to(device)
               outputs = model(tokens)
               loss = criterion(outputs, targets)
               val_loss += loss.item()
      
       val_loss /= len(val_loader)
       val_losses.append(val_loss)
      
       print(f"Epoch {epoch+1:2d}/{epochs} | Train Loss: {train_loss:.4f} | Val Loss: {val_loss:.4f}")
  
   return train_losses, val_losses

We are training the model using Adam's losses and MSE in GPU, if any, it is added more than mini-batches to use weights and renew weights. We switch to verification test mode at the end of each epoch, track training and verification, printing so that we can recognize the capacity of learning. Look Full codes here.

print("n📝 Generating synthetic data...")
data = generate_synthetic_data(2000)
split_idx = int(0.8 * len(data))
train_data, val_data = data[:split_idx], data[split_idx:]
print(f"Train samples: {len(train_data)}, Val samples: {len(val_data)}")


print("n🔤 Building tokenizer...")
tokenizer = SimpleTokenizer()
tokenizer.fit([text for text, _ in train_data])
print(f"Vocabulary size: {tokenizer.vocab_size}")


train_dataset = RLMDataset(train_data, tokenizer)
val_dataset = RLMDataset(val_data, tokenizer)
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=32)


print("n🏗️ Building Regression Language Model...")
model = RegressionLanguageModel(vocab_size=tokenizer.vocab_size)
print(f"Model parameters: {sum(p.numel() for p in model.parameters()):,}")


train_losses, val_losses = train_rlm(model, train_loader, val_loader)


plt.figure(figsize=(10, 4))
plt.plot(train_losses, label="Train Loss", linewidth=2)
plt.plot(val_losses, label="Val Loss", linewidth=2)
plt.xlabel('Epoch')
plt.ylabel('Loss (MSE)')
plt.title('RLM Training Progress')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()


print("n🎯 Testing Predictions:")
print("-" * 60)
test_examples = [
   "The temperature is 25.5 degrees",
   "I rate this 8.0 out of ten",
   "The price is 45.0 dollars",
   "75.0 percent complete"
]


with torch.no_grad():
   for text in test_examples:
       tokens = torch.tensor([tokenizer.encode(text)]).to(device)
       prediction = model(tokens).item()
       print(f"Input: {text}")
       print(f"Predicted value: {prediction:.4f}n")


print("✅ RLM Tutorial Complete!")

It produces and separates the production data, we are suitable for our Tokenzer, fasten everything in pytorch datasets / downloads, and create a conversational vendor. We are training the model, visualize the curves of loss to confirm reading, and then a few natural language changes to see the predicted prices. Therefore, we fill the RLM pipe in the end.

In conclusion, we worked successfully, training and examined the model of recovery that could predict continuous prices in text files. We see how embarking reports, SFERFORMERs, and the simple head to postpone it is powerful to replace SEMANTS prices. By producing the data of executiveness, visualization progress, and testing progress, shows how the RLMs bridge the gap between the understanding of languages and reasons for numbers.

Look Full codes here. Feel free to look our GITHUB page for tutorials, codes and letters of writing. Also, feel free to follow it Sane and don't forget to join ours 100K + ml subreddit Then sign up for Our newspaper. Wait! Do you with a telegram? Now you can join us with a telegram.

Asphazzaq is a Markteach Media Inc. According to a View Business and Developer, Asifi is committed to integrating a good social intelligence. His latest attempt is launched by the launch of the chemistrylife plan for an intelligence, MarktechPost, a devastating intimate practice of a machine learning and deep learning issues that are clearly and easily understood. The platform is adhering to more than two million moon visits, indicating its popularity between the audience.

Follow MarkteachPost: We have added like a favorite source to Google.

Source link

nimda October 5, 2025

0 2 5 minutes read