Generative AI

Build a multi-agent system to integrate transcriptipmic, proteomic, and metabolomic data to interpret Path Reming

In this course, we create a high-level agent pipeline that interprets integrated data, including transcriptomics, proteomics and metabolomics, to unlock key features in nature. We start by generating synthetic information that corrects the natural realities and move step by step through agents designed for statistical analysis, drug learning, drug printing. Each component contributes to a designated interpretation process that allows us to identify important genes, dynamic links, and generate biologic sound hypotheses supported by data patterns. Look Full codes here.

Copy the code– I fed youUse a different browser
import numpy as np
import pandas as pd
from collections import defaultdict, deque
from dataclasses import dataclass
from typing import Dict, List, Tuple
import warnings
warnings.filterwarnings('ignore')


PATHWAY_DB = {
   'Glycolysis': {'genes': ['HK2', 'PFKM', 'PKM', 'LDHA', 'GAPDH', 'ENO1'],
                  'metabolites': ['Glucose', 'G6P', 'F16BP', 'Pyruvate', 'Lactate'], 'score': 0},
   'TCA_Cycle': {'genes': ['CS', 'IDH1', 'IDH2', 'OGDH', 'SDHA', 'MDH2'],
                 'metabolites': ['Citrate', 'Isocitrate', 'α-KG', 'Succinate', 'Malate'], 'score': 0},
   'Oxidative_Phosphorylation': {'genes': ['NDUFA1', 'NDUFB5', 'COX5A', 'COX7A1', 'ATP5A1', 'ATP5B'],
                                  'metabolites': ['ATP', 'ADP', 'NAD+', 'NADH'], 'score': 0},
   'Fatty_Acid_Synthesis': {'genes': ['ACACA', 'FASN', 'SCD1', 'ACLY'],
                            'metabolites': ['Malonyl-CoA', 'Palmitate', 'Oleate'], 'score': 0},
   'Fatty_Acid_Oxidation': {'genes': ['CPT1A', 'ACOX1', 'HADHA', 'ACADM'],
                            'metabolites': ['Acyl-CoA', 'Acetyl-CoA'], 'score': 0},
   'Amino_Acid_Metabolism': {'genes': ['GOT1', 'GOT2', 'GLUD1', 'BCAT1', 'BCAT2'],
                             'metabolites': ['Glutamate', 'Glutamine', 'Alanine', 'Aspartate'], 'score': 0},
   'Pentose_Phosphate': {'genes': ['G6PD', 'PGD', 'TKTL1'],
                         'metabolites': ['R5P', 'NADPH'], 'score': 0},
   'Cell_Cycle_G1S': {'genes': ['CCND1', 'CDK4', 'CDK6', 'RB1', 'E2F1'], 'metabolites': [], 'score': 0},
   'Cell_Cycle_G2M': {'genes': ['CCNB1', 'CDK1', 'CDC25C', 'WEE1'], 'metabolites': [], 'score': 0},
   'Apoptosis': {'genes': ['BCL2', 'BAX', 'BID', 'CASP3', 'CASP8', 'CASP9'], 'metabolites': [], 'score': 0},
   'mTOR_Signaling': {'genes': ['MTOR', 'RPTOR', 'RICTOR', 'AKT1', 'TSC1', 'TSC2'],
                      'metabolites': ['Leucine', 'ATP'], 'score': 0},
   'HIF1_Signaling': {'genes': ['HIF1A', 'EPAS1', 'VEGFA', 'SLC2A1'], 'metabolites': ['Lactate'], 'score': 0},
   'p53_Signaling': {'genes': ['TP53', 'MDM2', 'CDKN1A', 'BAX'], 'metabolites': [], 'score': 0},
   'PI3K_AKT': {'genes': ['PIK3CA', 'AKT1', 'AKT2', 'PTEN', 'PDK1'], 'metabolites': [], 'score': 0},
}


GENE_INTERACTIONS = {
   'HK2': ['PFKM', 'HIF1A', 'MTOR'], 'PFKM': ['PKM', 'HK2'], 'PKM': ['LDHA', 'HIF1A'],
   'MTOR': ['AKT1', 'HIF1A', 'TSC2'], 'HIF1A': ['VEGFA', 'SLC2A1', 'PKM', 'LDHA'],
   'TP53': ['MDM2', 'CDKN1A', 'BAX', 'CASP3'], 'AKT1': ['MTOR', 'TSC2', 'MDM2'],
   'CCND1': ['CDK4', 'RB1'], 'CDK4': ['RB1'], 'RB1': ['E2F1'],
}


DRUG_TARGETS = {
   'Metformin': ['NDUFA1'], 'Rapamycin': ['MTOR'], '2-DG': ['HK2'],
   'Bevacizumab': ['VEGFA'], 'Palbociclib': ['CDK4', 'CDK6'], 'Nutlin-3': ['MDM2']
}


@dataclass
class OmicsProfile:
   transcriptomics: pd.DataFrame
   proteomics: pd.DataFrame
   metabolomics: pd.DataFrame
   metadata: Dict

We are building the natural foundations of our program. We describe pathway information, gene-gene interactions, and drug-targeted mappings that serve as a reference network for all downstream analyses. We also import important libraries and create a data class to store multics data in a structured format. Look Full codes here.

Copy the code– I fed youUse a different browser
class AdvancedOmicsGenerator:
   @staticmethod
   def generate_coherent_omics(n_samples=30, n_timepoints=4, noise=0.2):
       genes = list(set(g for p in PATHWAY_DB.values() for g in p['genes']))
       metabolites = list(set(m for p in PATHWAY_DB.values() for m in p['metabolites'] if m))
       proteins = [f"P_{g}" for g in genes]
       n_control = n_samples // 2
       samples_per_tp = n_samples // n_timepoints
       trans = np.random.randn(len(genes), n_samples) * noise + 10
       metab = np.random.randn(len(metabolites), n_samples) * noise + 5
       for tp in range(n_timepoints):
           start_idx = n_control + tp * samples_per_tp
           end_idx = start_idx + samples_per_tp
           progression = (tp + 1) / n_timepoints
           for i, gene in enumerate(genes):
               if gene in PATHWAY_DB['Glycolysis']['genes']:
                   trans[i, start_idx:end_idx] += np.random.uniform(1.5, 3.5) * progression
               elif gene in PATHWAY_DB['Oxidative_Phosphorylation']['genes']:
                   trans[i, start_idx:end_idx] -= np.random.uniform(1, 2.5) * progression
               elif gene in PATHWAY_DB['Cell_Cycle_G1S']['genes'] + PATHWAY_DB['Cell_Cycle_G2M']['genes']:
                   trans[i, start_idx:end_idx] += np.random.uniform(1, 2) * progression
               elif gene in PATHWAY_DB['HIF1_Signaling']['genes']:
                   trans[i, start_idx:end_idx] += np.random.uniform(2, 4) * progression
               elif gene in PATHWAY_DB['p53_Signaling']['genes']:
                   trans[i, start_idx:end_idx] -= np.random.uniform(0.5, 1.5) * progression
           for i, met in enumerate(metabolites):
               if met in ['Lactate', 'Pyruvate', 'G6P']:
                   metab[i, start_idx:end_idx] += np.random.uniform(1.5, 3) * progression
               elif met in ['ATP', 'Citrate', 'Malate']:
                   metab[i, start_idx:end_idx] -= np.random.uniform(1, 2) * progression
               elif met in ['NADPH']:
                   metab[i, start_idx:end_idx] += np.random.uniform(1, 2) * progression
       prot = trans * 0.8 + np.random.randn(*trans.shape) * (noise * 2)
       conditions = ['Control'] * n_control + [f'Disease_T{i//samples_per_tp}' for i in range(n_samples - n_control)]
       trans_df = pd.DataFrame(trans, index=genes, columns=[f"S{i}_{c}" for i, c in enumerate(conditions)])
       prot_df = pd.DataFrame(prot, index=proteins, columns=trans_df.columns)
       metab_df = pd.DataFrame(metab, index=metabolites, columns=trans_df.columns)
       metadata = {'conditions': conditions, 'n_timepoints': n_timepoints}
       return OmicsProfile(trans_df, prot_df, metab_df, metadata)


class StatisticalAgent:
   @staticmethod
   def differential_analysis(data_df, control_samples, disease_samples):
       control = data_df[control_samples]
       disease = data_df[disease_samples]
       fc = (disease.mean(axis=1) - control.mean(axis=1))
       pooled_std = np.sqrt((control.var(axis=1) + disease.var(axis=1)) / 2)
       t_stat = fc / (pooled_std + 1e-6)
       p_values = 2 * (1 - np.minimum(np.abs(t_stat) / (np.abs(t_stat).max() + 1e-6), 0.999))
       sorted_pvals = np.sort(p_values)
       ranks = np.searchsorted(sorted_pvals, p_values) + 1
       fdr = p_values * len(p_values) / ranks
       return pd.DataFrame({'log2FC': fc, 't_stat': t_stat, 'p_value': p_values,
           'FDR': np.minimum(fdr, 1.0), 'significant': (np.abs(fc) > 1.0) & (fdr < 0.05)}).sort_values('log2FC', ascending=False)


   @staticmethod
   def temporal_analysis(data_df, metadata):
       timepoints = metadata['n_timepoints']
       samples_per_tp = data_df.shape[1] // (timepoints + 1)
       trends = {}
       for gene in data_df.index:
           means = []
           for tp in range(timepoints):
               start = samples_per_tp + tp * samples_per_tp
               end = start + samples_per_tp
               means.append(data_df.iloc[:, start:end].loc[gene].mean())
           if len(means) > 1:
               x = np.arange(len(means))
               coeffs = np.polyfit(x, means, deg=min(2, len(means)-1))
               trends[gene] = {'slope': coeffs[0] if len(coeffs) > 1 else 0, 'trajectory': means}
       return trends

We focus on the production of artificial but consistent data with many environmental details and performing preliminary statistical analysis. We simulate disease progression across timepoints and compute fold changes, P-falues, and FDR significance levels for targeted genes, proteins and metabolites. We also examine temporal trends to capture how voice values ​​are expressed over time. Look Full codes here.

Copy the code– I fed youUse a different browser
class NetworkAnalysisAgent:
   def __init__(self, interactions):
       self.graph = interactions
   def find_master_regulators(self, diff_genes):
       sig_genes = diff_genes[diff_genes['significant']].index.tolist()
       impact_scores = {}
       for gene in sig_genes:
           if gene in self.graph:
               downstream = self._bfs_downstream(gene, max_depth=2)
               sig_downstream = [g for g in downstream if g in sig_genes]
               impact_scores[gene] = {
                   'downstream_count': len(downstream),
                   'sig_downstream': len(sig_downstream),
                   'score': len(sig_downstream) / (len(downstream) + 1),
                   'fc': diff_genes.loc[gene, 'log2FC']
               }
       return sorted(impact_scores.items(), key=lambda x: x[1]['score'], reverse=True)
   def _bfs_downstream(self, start, max_depth=2):
       visited, queue = set(), deque([(start, 0)])
       downstream = []
       while queue:
           node, depth = queue.popleft()
           if depth >= max_depth or node in visited:
               continue
           visited.add(node)
           if node in self.graph:
               for neighbor in self.graph[node]:
                   if neighbor not in visited:
                       downstream.append(neighbor)
                       queue.append((neighbor, depth + 1))
       return downstream
   def causal_inference(self, diff_trans, diff_prot, diff_metab):
       causal_links = []
       for gene in diff_trans[diff_trans['significant']].index:
           gene_fc = diff_trans.loc[gene, 'log2FC']
           protein = f"P_{gene}"
           if protein in diff_prot.index:
               prot_fc = diff_prot.loc[protein, 'log2FC']
               correlation = np.sign(gene_fc) == np.sign(prot_fc)
               if correlation and abs(prot_fc) > 0.5:
                   causal_links.append(('transcription', gene, protein, gene_fc, prot_fc))
           for pathway, content in PATHWAY_DB.items():
               if gene in content['genes']:
                   for metab in content['metabolites']:
                       if metab in diff_metab.index and diff_metab.loc[metab, 'significant']:
                           metab_fc = diff_metab.loc[metab, 'log2FC']
                           causal_links.append(('enzymatic', gene, metab, gene_fc, metab_fc))
       return causal_links

We use a network analysis agent that identifies master controllers and presents relationships. We use graph traversal to examine the influence of each species on others and to identify connections between trapralic, proteomic, and metabolic layers. This helps us understand which numbers have the greatest impact on biological entities. Look Full codes here.

Copy the code– I fed youUse a different browser
class PathwayEnrichmentAgent:
   def __init__(self, pathway_db, interactions):
       self.pathway_db = pathway_db
       self.interactions = interactions
   def topology_weighted_enrichment(self, diff_genes, diff_metab, network_agent):
       enriched = {}
       for pathway, content in self.pathway_db.items():
           sig_genes = [g for g in content['genes'] if g in diff_genes.index and diff_genes.loc[g, 'significant']]
           weighted_score = 0
           for gene in sig_genes:
               base_score = abs(diff_genes.loc[gene, 'log2FC'])
               downstream = network_agent._bfs_downstream(gene, max_depth=1)
               centrality = len(downstream) / 10
               weighted_score += base_score * (1 + centrality)
           sig_metabs = [m for m in content['metabolites'] if m in diff_metab.index and diff_metab.loc[m, 'significant']]
           metab_score = sum(abs(diff_metab.loc[m, 'log2FC']) for m in sig_metabs)
           total_score = (weighted_score + metab_score * 2) / max(len(content['genes']) + len(content['metabolites']), 1)
           if total_score > 0.5:
               enriched[pathway] = {'score': total_score, 'genes': sig_genes, 'metabolites': sig_metabs,
                   'gene_fc': {g: diff_genes.loc[g, 'log2FC'] for g in sig_genes},
                   'metab_fc': {m: diff_metab.loc[m, 'log2FC'] for m in sig_metabs},
                   'coherence': self._pathway_coherence(sig_genes, diff_genes)}
       return enriched
   def _pathway_coherence(self, genes, diff_genes):
       if len(genes) < 2:
           return 0
       fcs = [diff_genes.loc[g, 'log2FC'] for g in genes]
       same_direction = sum(1 for fc in fcs if np.sign(fc) == np.sign(fcs[0]))
       return same_direction / len(genes)

We introduce a method of reasoning by introducing a topology-supported analysis. We examine which biological pathways show the greatest activation or suppression and rank them according to Network Centricity to show their broad influence. The agent also examines the consistency of the pathway, indicating whether the genes in the pathway show consistent direct movement. Look Full codes here.

Copy the code– I fed youUse a different browser
class DrugRepurposingAgent:
   def __init__(self, drug_db):
       self.drug_db = drug_db


   def predict_drug_response(self, diff_genes, master_regulators):
       predictions = []
       for drug, targets in self.drug_db.items():
           score = 0
           affected_targets = []
           for target in targets:
               if target in diff_genes.index:
                   fc = diff_genes.loc[target, 'log2FC']
                   is_sig = diff_genes.loc[target, 'significant']
                   if is_sig:
                       drug_benefit = -fc if fc > 0 else 0
                       score += drug_benefit
                       affected_targets.append((target, fc))
                   if target in [mr[0] for mr in master_regulators[:5]]:
                       score += 2
           if score > 0:
               predictions.append({
                   'drug': drug,
                   'score': score,
                   'targets': affected_targets,
                   'mechanism': 'Inhibition of upregulated pathway'
               })
       return sorted(predictions, key=lambda x: x['score'], reverse=True)


class AIHypothesisEngine:
   def generate_comprehensive_report(self, omics_data, analysis_results):
       report = ["="*80, "ADVANCED MULTI-OMICS INTERPRETATION REPORT", "="*80, ""]
       trends = analysis_results['temporal']
       top_trends = sorted(trends.items(), key=lambda x: abs(x[1]['slope']), reverse=True)[:5]
       report.append("  TEMPORAL DYNAMICS ANALYSIS:")
       for gene, data in top_trends:
           direction = "↑ Increasing" if data['slope'] > 0 else "↓ Decreasing"
           report.append(f"  {gene}: {direction} (slope: {data['slope']:.3f})")
       report.append("n🕸  Abalawuli abahle (abaphezulu be-5): ") Ngokwakhetho, idatha ekuhlaziyweni_results['master_regs'][:5]: Umbiko.APPend (F "• {Gene}: Izilawuli {Idatha['sig_downstream']} Uhlobo Lwama-DysRegulated (FC: {Idatha['fc']: +. 2f}, umthelela: {idatha['score']: .3f}) ") Umbiko.APPend ("  n🧬 Ecebised Pathways: ") For path, data in organized (analysis_results['pathways'].itims (), key = lambda x: x[1]['score']return = true): Report.APPend(F "n ► {Pathway} (Score: {Data['score']: .3f}, Compactness: {Data['coherence']: .2f}) ") Report.APPend (F" Type: {',' .Join (Data['genes'][:6])} ") If data['metabolites']: Report.APnd (F "Metabolites: {',' .Join (Data['metabolites'][:4])} ") Report.APPend (" N🔗 Causal relationships (top 10): ") By Link_Type, source, target, FC1, FC2 in analysis_results['causal'][:10]: Report.APPend (F "{Source} →[{link_type}]→ {Target} (FC: {FC1: +. 2f} → {FC2: +. 2f 💊 Predictions of Drug Activity: ") By analysis of the analysis of Reacher_Reralts['drugs'][:5]: Report.APPend (F "• {PREM['drug']} (Score: {PREM['score']: .2f}) ") Report.APPend (f" target: {',' .Join ([f'{t[0]} ({t[1]: +. 1f}) 'with t['targets']]])} ") Report.APPend ("  N🤖 AI-BOOLOGIOLY HYPOOTESES:  N ") of I, hyp in Ensumerate (Self._Genise_Aady_Aadys (analysis_avalts)" Results): Hypotheses = []
       Methods = Results['pathways']
       If 'glycolysis' in pathways and 'oxidative_phosphorylation' in Pakays: Glyc = Pathways['Glycolysis']['score']
           
           
           
           oxphos = methods['Oxidative_Phosphorylation']['score']
           
           
           
           If Glyc > Oxphos * 1.5: Hypotheses.APPendes ("The Warburg Effect: Aerobic Glycolysis Inhibition of Oxidative Phosphorylation increased by PIF1A. Cell-Cycle Activation by MTOR Signaling shows Anabolioliolic printing; DIB CDK4 / 6 and MTOR Eviving can work. ") If results['master_regs']: top_mr = results['master_regs'][0]
           
           
           
           hypotheses.append(F "UPSTREAM row: {top_mr[0]} Control { top_mr[1]['sig_downstream']} Gene type; Directing this area can spread network-wide fixes. ") styles = results['temporal']
       Development = [g for g, d in trends.items() if abs(d['slope'])> 0.5]If len (developing)> 5: hypotheses.append (f "dysRegation) { hypotheses.append (" response to hypoxia Microenviction techniques ["Complex multi-factorial dysregulation detected."]

We present drug addiction and hypothesis Generation Agents. We include potential drugs based on the creation of their stones and the importance of the network of affected genes, and then include translational hypotheses that link the activity of the payway to possible interventions. The Bibiko Generation Engine summarizes these findings in an organized, readable manner. Look Full codes here.

Copy the code– I fed youUse a different browser
def run_advanced_omics_interpretation():
   print("🧬 Initializing Advanced Multi-Agent Omics System...n")
   omics = AdvancedOmicsGenerator.generate_coherent_omics()
   print("📊 I-Dilti-Omics Dataset ") Stat_Agent = Statisticagent () Ukulawula_Samples = [c for c in omics.transcriptomics.columns if 'Control' in c]
   Izifo_Samples = [c for c in omics.transcriptomics.columns if 'Disease' in c]
   I-Diff_trans = STAT_TRENT.DIFRED CORTALCEPTICTICTICKS, OMICS.Transcriptemics, Izifo_Samples_Samples) I-Diff_metab_Salys.MetaBolodics (Omics.Metabolomics, Omics_Amples, isifo_samples) okwesikhashana = stat_agent_temporal_anallysis (omics.transcriptics, i-omics.metadata) Inethiwekhi_Agent_Trans) I-Master_Regs = Inethiwekhi_Aagent.Casur.Casul.Casul.Casul.Casul.Casul.Casul.Casul.Casul.Casul.Casul.Casul.Casul.Casul.Trals.Casul.Casul.Casul.Casul.Casul.Casul.Casul.Casul.CRASH.I-FIFF_TRANS, Diff_Prot, I-Diff_Metab) I-Pathay_Agent = I-Pathawenrichichentagent (i-Pay_DB, i-gene_interfy (i-pay_metab, i-drugrepting.etposingagent) izidakamizwa_Agent.predict_rug_response (i-diff_trans, i-master_frans) imiphumela emibi: I-Hypothesine.Nenet_Cemport_report

We plan the entire workflow, run all users in sequence and combine their results in a comprehensive report. We roll out the end of the pipeline – end to end, from data generation to visual generation validation, ensuring that each element contributes to the overall definition. The final release provides an integrated view of various functions.

In conclusion, this tutorial has shown how a structured workflow can connect different layers of OMICS data in an Interpretation System. By combining statistical reasoning, network topology, and biological context, we produced a comprehensive summary that outlines potential regulatory mechanisms and selected therapeutic directions. This approach remains clear, data-driven, and adaptable to both a wide variety of datasets.


Look Full codes here. Feel free to take a look at ours GitHub page for tutorials, code and notebooks. Also, feel free to follow us Kind of stubborn and don't forget to join ours 100K + ML Subreddit and sign up Our newsletter. Wait! Do you telegraph? Now you can join us by telegraph.

The post Building a multi-agent system for integrated transcriptipmic, proteomic, and metabolomic data sets with Paway consulting methods appeared first on Marktechpost.

Source link

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button