Generative AI

Coding Implementation of an OpenAI-Assisted Privacy-Preserving Federated Detection System from Scratch Using Lightweight PyTorch Simulation

In this tutorial, we show how to simulate a privacy-preserving fraud detection system using Federated Learning without relying on heavy frameworks or complex infrastructure. We develop a clean, CPU-compatible setup that simulates ten independent banks, each training a local fraud detection model on their highly heterogeneous transaction data. We coordinate these local updates using a simple FedAvg integration loop, which allows us to develop a global model while ensuring that no raw transaction data ever leaves the client. Along with this, we integrate OpenAI to support post-training analysis and risk-focused reporting, showing how integrated learning results can be translated into decision-friendly insights. Check it out Full Codes here.

!pip -q install torch scikit-learn numpy openai


import time, random, json, os, getpass
import numpy as np
import torch
import torch.nn as nn
from torch.utils.data import DataLoader, TensorDataset
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import roc_auc_score, average_precision_score, accuracy_score
from openai import OpenAI


SEED = 7
random.seed(SEED); np.random.seed(SEED); torch.manual_seed(SEED)


DEVICE = torch.device("cpu")
print("Device:", DEVICE)

We set up the workspace and import all the libraries needed for data generation, modeling, testing, and reporting. We also adjust the random seeds and device configuration to ensure that our compiled simulations are always deterministic and reproducible on the CPU. Check it out Full Codes here.

X, y = make_classification(
   n_samples=60000,
   n_features=30,
   n_informative=18,
   n_redundant=8,
   weights=[0.985, 0.015],
   class_sep=1.5,
   flip_y=0.01,
   random_state=SEED
)


X = X.astype(np.float32)
y = y.astype(np.int64)


X_train_full, X_test, y_train_full, y_test = train_test_split(
   X, y, test_size=0.2, stratify=y, random_state=SEED
)


server_scaler = StandardScaler()
X_train_full_s = server_scaler.fit_transform(X_train_full).astype(np.float32)
X_test_s = server_scaler.transform(X_test).astype(np.float32)


test_loader = DataLoader(
   TensorDataset(torch.from_numpy(X_test_s), torch.from_numpy(y_test)),
   batch_size=1024,
   shuffle=False
)

We generate a highly uneven, credit card-like fraud data set and divide it into training sets and test sets. We set up server-side data and configure a global test loader that allows us to continuously test the integrated model after each integration cycle. Check it out Full Codes here.

def dirichlet_partition(y, n_clients=10, alpha=0.35):
   classes = np.unique(y)
   idx_by_class = [np.where(y == c)[0] for c in classes]
   client_idxs = [[] for _ in range(n_clients)]
   for idxs in idx_by_class:
       np.random.shuffle(idxs)
       props = np.random.dirichlet(alpha * np.ones(n_clients))
       cuts = (np.cumsum(props) * len(idxs)).astype(int)
       prev = 0
       for cid, cut in enumerate(cuts):
           client_idxs[cid].extend(idxs[prev:cut].tolist())
           prev = cut
   return [np.array(ci, dtype=np.int64) for ci in client_idxs]


NUM_CLIENTS = 10
client_idxs = dirichlet_partition(y_train_full, NUM_CLIENTS, 0.35)


def make_client_split(X, y, idxs):
   Xi, yi = X[idxs], y[idxs]
   if len(np.unique(yi)) < 2:
       other = np.where(y == (1 - yi[0]))[0]
       add = np.random.choice(other, size=min(10, len(other)), replace=False)
       Xi = np.concatenate([Xi, X[add]])
       yi = np.concatenate([yi, y[add]])
   return train_test_split(Xi, yi, test_size=0.15, stratify=yi, random_state=SEED)


client_data = [make_client_split(X_train_full, y_train_full, client_idxs[c]) for c in range(NUM_CLIENTS)]


def make_client_loaders(Xtr, ytr, Xva, yva):
   sc = StandardScaler()
   Xtr_s = sc.fit_transform(Xtr).astype(np.float32)
   Xva_s = sc.transform(Xva).astype(np.float32)
   tr = DataLoader(TensorDataset(torch.from_numpy(Xtr_s), torch.from_numpy(ytr)), batch_size=512, shuffle=True)
   va = DataLoader(TensorDataset(torch.from_numpy(Xva_s), torch.from_numpy(yva)), batch_size=512)
   return tr, va


client_loaders = [make_client_loaders(*cd) for cd in client_data]

We simulate realistic non-IID behavior by dividing the training data across ten clients using the Dirichlet distribution. We then create independent customer-level loaders for rail and guarantees, ensuring that each bank matched works with its local scale data. Check it out Full Codes here.

class FraudNet(nn.Module):
   def __init__(self, in_dim):
       super().__init__()
       self.net = nn.Sequential(
           nn.Linear(in_dim, 64),
           nn.ReLU(),
           nn.Dropout(0.1),
           nn.Linear(64, 32),
           nn.ReLU(),
           nn.Dropout(0.1),
           nn.Linear(32, 1)
       )
   def forward(self, x):
       return self.net(x).squeeze(-1)


def get_weights(model):
   return [p.detach().cpu().numpy() for p in model.state_dict().values()]


def set_weights(model, weights):
   keys = list(model.state_dict().keys())
   model.load_state_dict({k: torch.tensor(w) for k, w in zip(keys, weights)}, strict=True)


@torch.no_grad()
def evaluate(model, loader):
   model.eval()
   bce = nn.BCEWithLogitsLoss()
   ys, ps, losses = [], [], []
   for xb, yb in loader:
       logits = model(xb)
       losses.append(bce(logits, yb.float()).item())
       ys.append(yb.numpy())
       ps.append(torch.sigmoid(logits).numpy())
   y_true = np.concatenate(ys)
   y_prob = np.concatenate(ps)
   return {
       "loss": float(np.mean(losses)),
       "auc": roc_auc_score(y_true, y_prob),
       "ap": average_precision_score(y_true, y_prob),
       "acc": accuracy_score(y_true, (y_prob >= 0.5).astype(int))
   }


def train_local(model, loader, lr):
   opt = torch.optim.Adam(model.parameters(), lr=lr)
   bce = nn.BCEWithLogitsLoss()
   model.train()
   for xb, yb in loader:
       opt.zero_grad()
       loss = bce(model(xb), yb.float())
       loss.backward()
       opt.step()

We describe a neural network used for fraud detection and resource functions for training, testing, and weighting. We use lightweight localization and metric calculation to keep client-side updates efficient and easy to think about. Check it out Full Codes here.

def fedavg(weights, sizes):
   total = sum(sizes)
   return [
       sum(w[i] * (s / total) for w, s in zip(weights, sizes))
       for i in range(len(weights[0]))
   ]


ROUNDS = 10
LR = 5e-4


global_model = FraudNet(X_train_full.shape[1])
global_weights = get_weights(global_model)


for r in range(1, ROUNDS + 1):
   client_weights, client_sizes = [], []
   for cid in range(NUM_CLIENTS):
       local = FraudNet(X_train_full.shape[1])
       set_weights(local, global_weights)
       train_local(local, client_loaders[cid][0], LR)
       client_weights.append(get_weights(local))
       client_sizes.append(len(client_loaders[cid][0].dataset))
   global_weights = fedavg(client_weights, client_sizes)
   set_weights(global_model, global_weights)
   metrics = evaluate(global_model, test_loader)
   print(f"Round {r}: {metrics}")

We organize the ensemble learning process by iteratively training local client models and averaging their parameters using FedAvg. We evaluate the global model after each cycle to monitor convergence and understand how collective learning improves fraud detection performance. Check it out Full Codes here.

OPENAI_API_KEY = getpass.getpass("Enter OPENAI_API_KEY (input hidden): ").strip()


if OPENAI_API_KEY:
   os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
   client = OpenAI()


   summary = {
       "rounds": ROUNDS,
       "num_clients": NUM_CLIENTS,
       "final_metrics": metrics,
       "client_sizes": [len(client_loaders[c][0].dataset) for c in range(NUM_CLIENTS)],
       "client_fraud_rates": [float(client_data[c][1].mean()) for c in range(NUM_CLIENTS)]
   }


   prompt = (
       "Write a concise internal fraud-risk report.n"
       "Include executive summary, metric interpretation, risks, and next steps.nn"
       + json.dumps(summary, indent=2)
   )


   resp = client.responses.create(model="gpt-5.2", input=prompt)
   print(resp.output_text)

We convert technical results into a short analytical report using a foreign language model. We securely accept an API key via keyboard input and generate decision-oriented insights that summarize performance, risks, and recommended next steps.

In conclusion, we have shown how to implement collaborative learning from the very beginning in Colab notebooks while being stable, interpretive, and realistic. We noted how extreme data variability across clients influences convergence and why careful integration and testing is critical in fraud detection settings. We also extended the workflow by generating an automated report for the risk team, showing how analysis results can be translated into decision-friendly insights. Finally, we presented a practical blueprint for evaluating integrated fraud models that emphasize privacy awareness, simplicity, and real-world relevance.


Check it out Full Codes here. Also, feel free to follow us Twitter and don't forget to join our 100k+ ML SubReddit and Subscribe to Our newspaper. Wait! are you on telegram? now you can join us on telegram too.


Asif Razzaq is the CEO of Marktechpost Media Inc. As a visionary entrepreneur and engineer, Asif is committed to harnessing the power of Artificial Intelligence for the benefit of society. His latest endeavor is the launch of Artificial Intelligence Media Platform, Marktechpost, which stands out for its extensive coverage of machine learning and deep learning stories that sound technically sound and easily understood by a wide audience. The platform boasts of more than 2 million monthly views, which shows its popularity among viewers.

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button