Build Outonoumonoumonoumonoumonous lab protocol protein and layer using Salesforce Codegen design agent for testing and security performance

In this tutorial, we build a let-lab platener and actuator planner that acts as an intelligent agent for experiment design and execution. We are designing the program using python and integrating the Codegen-350m-Mono model for Salesforces natural language consulting. We organize the pipeline in modular parts: Protocolparser for extracting structured data, such as steps, duration, and temperatures, from document combinations; InventoryManager to ensure reagent availability and expiration; Edit Editing to generate timelines and parallels; and a safety layer to identify biosafety or chemical hazards. The LLM is used to generate proposals for excellence, effectively closing the loop between understanding, planning, verification, and analysis.
import re, json, pandas as pd
from datetime import datetime, timedelta
from collections import defaultdict
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
MODEL_NAME = "Salesforce/codegen-350M-mono"
print("Loading CodeGen model (30 seconds)...")
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
tokenizer.pad_token = tokenizer.eos_token
model = AutoModelForCausalLM.from_pretrained(
MODEL_NAME, torch_dtype=torch.float16, device_map="auto"
)
print("✓ Model loaded!")
We start by importing the key libraries and uploading the Salesforce Codegen-350m-Mono model to your environment for lightweight, API-Free discovery. We initialize both the tokenizer and model with FAFEL16 for accuracy and device automation to ensure compatibility and speed on Colob GPUS.
class ProtocolParser:
def read_protocol(self, text):
steps = []
lines = text.split('n')
for i, line in enumerate(lines, 1):
step_match = re.search(r'^(d+).s+(.+)', line.strip())
if step_match:
num, name = step_match.groups()
context="n".join(lines[i:min(i+4, len(lines))])
duration = self._extract_duration(context)
temp = self._extract_temp(context)
safety = self._check_safety(context)
steps.append({
'step': int(num), 'name': name, 'duration_min': duration,
'temp': temp, 'safety': safety, 'line': i, 'details': context[:200]
})
return steps
def _extract_duration(self, text):
text = text.lower()
if 'overnight' in text: return 720
match = re.search(r'(d+)s*(?:hour|hr|h)(?:s)?(?!w)', text)
if match: return int(match.group(1)) * 60
match = re.search(r'(d+)s*(?:min|minute)(?:s)?', text)
if match: return int(match.group(1))
match = re.search(r'(d+)-(d+)s*(?:min|minute)', text)
if match: return (int(match.group(1)) + int(match.group(2))) // 2
return 30
def _extract_temp(self, text):
text = text.lower()
if '4°c' in text or '4 °c' in text or '4°' in text: return '4C'
if '37°c' in text or '37 °c' in text: return '37C'
if '-20°c' in text or '-80°c' in text: return 'FREEZER'
if 'room temp' in text or 'rt' in text or 'ambient' in text: return 'RT'
return 'RT'
def _check_safety(self, text):
flags = []
text_lower = text.lower()
if re.search(r'bsl-[23]|biosafety', text_lower): flags.append('BSL-2/3')
if re.search(r'caution|corrosive|hazard|toxic', text_lower): flags.append('HAZARD')
if 'sharp' in text_lower or 'needle' in text_lower: flags.append('SHARPS')
if 'dark' in text_lower or 'light-sensitive' in text_lower: flags.append('LIGHT-SENSITIVE')
if 'flammable' in text_lower: flags.append('FLAMMABLE')
return flags
class InventoryManager:
def __init__(self, csv_text):
from io import StringIO
self.df = pd.read_csv(StringIO(csv_text))
self.df['expiry'] = pd.to_datetime(self.df['expiry'])
def check_availability(self, reagent_list):
issues = []
for reagent in reagent_list:
reagent_clean = reagent.lower().replace('_', ' ').replace('-', ' ')
matches = self.df[self.df['reagent'].str.lower().str.contains(
'|'.join(reagent_clean.split()[:2]), na=False, regex=True
)]
if matches.empty:
issues.append(f"❌ {reagent}: NOT IN INVENTORY")
else:
row = matches.iloc[0]
if row['expiry'] < datetime.now():
issues.append(f"⚠️ {reagent}: EXPIRED on {row['expiry'].date()} (lot {row['lot']})")
elif (row['expiry'] - datetime.now()).days < 30:
issues.append(f"⚠️ {reagent}: Expires soon ({row['expiry'].date()}, lot {row['lot']})")
if row['quantity'] < 10:
issues.append(f"⚠️ {reagent}: LOW STOCK ({row['quantity']} {row['unit']} remaining)")
return issues
def extract_reagents(self, protocol_text):
reagents = set()
patterns = [
r'b([A-Z][a-z]+(?:s+[A-Z][a-z]+)*)s+(?:antibody|buffer|solution)',
r'b([A-Z]{2,}(?:-[A-Z0-9]+)?)b',
r'(?:add|use|prepare|dilute)s+([a-z-]+s*(?:antibody|buffer|substrate|solution))',
]
for pattern in patterns:
matches = re.findall(pattern, protocol_text, re.IGNORECASE)
reagents.update(m.strip() for m in matches if len(m) > 2)
return list(reagents)[:15]
We define protocolparper classes and InventoryPlermanager classes to extract structured information to check and verify realent inventory. We check each protocol step for time, temperature, and security, while the inventory manager verifies stock levels, expiration dates, and realent availability with fuzzy matching.
class SchedulePlanner:
def make_schedule(self, steps, start_time="09:00"):
schedule = []
current = datetime.strptime(f"2025-01-01 {start_time}", "%Y-%m-%d %H:%M")
day = 1
for step in steps:
end = current + timedelta(minutes=step['duration_min'])
if step['duration_min'] > 480:
day += 1
current = datetime.strptime(f"2025-01-0{day} 09:00", "%Y-%m-%d %H:%M")
end = current
schedule.append({
'step': step['step'], 'name': step['name'][:40],
'start': current.strftime("%H:%M"), 'end': end.strftime("%H:%M"),
'duration': step['duration_min'], 'temp': step['temp'],
'day': day, 'can_parallelize': step['duration_min'] > 60,
'safety': ', '.join(step['safety']) if step['safety'] else 'None'
})
if step['duration_min'] <= 480:
current = end
return schedule
def optimize_parallelization(self, schedule):
parallel_groups = []
idle_time = 0
for i, step in enumerate(schedule):
if step['can_parallelize'] and i + 1 < len(schedule):
next_step = schedule[i+1]
if step['temp'] == next_step['temp']:
saved = min(step['duration'], next_step['duration'])
parallel_groups.append(
f"✨ Steps {step['step']} & {next_step['step']} can overlap → Save {saved} min"
)
idle_time += saved
return parallel_groups, idle_time
class SafetyValidator:
RULES = {
'ph_range': (5.0, 11.0),
'temp_limits': {'4C': (2, 8), '37C': (35, 39), 'RT': (20, 25)},
'max_concurrent_instruments': 3,
}
def validate(self, steps):
risks = []
for step in steps:
ph_match = re.search(r'phs*(d+.?d*)', step['details'].lower())
if ph_match:
ph = float(ph_match.group(1))
if not (self.RULES['ph_range'][0] <= ph <= self.RULES['ph_range'][1]):
risks.append(f"⚠️ Step {step['step']}: pH {ph} OUT OF SAFE RANGE")
if 'BSL-2/3' in step['safety']:
risks.append(f"🛡️ Step {step['step']}: BSL-2 cabinet REQUIRED")
if 'HAZARD' in step['safety']:
risks.append(f"🧤 Step {step['step']}: Full PPE + chemical hood REQUIRED")
if 'SHARPS' in step['safety']:
risks.append(f"💉 Step {step['step']}: Sharps container + needle safety")
if 'LIGHT-SENSITIVE' in step['safety']:
risks.append(f"🌑 Step {step['step']}: Work in dark/amber tubes")
return risks
We use schedule planner and security to design efficient testing times and enforce lab standards. We produce daily schedules, aim for unparalleled measures, and verify potential risks, such as unsafe PH levels, hazardous chemicals, or high quality requirements.
def llm_call(prompt, max_tokens=200):
try:
inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=512).to(model.device)
outputs = model.generate(
**inputs, max_new_tokens=max_tokens, do_sample=True,
temperature=0.7, top_p=0.9, pad_token_id=tokenizer.eos_token_id
)
return tokenizer.decode(outputs[0], skip_special_tokens=True)[len(prompt):].strip()
except:
return "Batch similar temperature steps together. Pre-warm instruments."
def agent_loop(protocol_text, inventory_csv, start_time="09:00"):
print("n🔬 AGENT STARTING PROTOCOL ANALYSIS...n")
parser = ProtocolParser()
steps = parser.read_protocol(protocol_text)
print(f"📄 Parsed {len(steps)} protocol steps")
inventory = InventoryManager(inventory_csv)
reagents = inventory.extract_reagents(protocol_text)
print(f"🧪 Identified {len(reagents)} reagents: {', '.join(reagents[:5])}...")
inv_issues = inventory.check_availability(reagents)
validator = SafetyValidator()
safety_risks = validator.validate(steps)
planner = SchedulePlanner()
schedule = planner.make_schedule(steps, start_time)
parallel_opts, time_saved = planner.optimize_parallelization(schedule)
total_time = sum(s['duration'] for s in schedule)
optimized_time = total_time - time_saved
opt_prompt = f"Protocol has {len(steps)} steps, {total_time} min total. Key bottleneck optimization:"
optimization = llm_call(opt_prompt, max_tokens=80)
return {
'steps': steps, 'schedule': schedule, 'inventory_issues': inv_issues,
'safety_risks': safety_risks, 'parallelization': parallel_opts,
'time_saved': time_saved, 'total_time': total_time,
'optimized_time': optimized_time, 'ai_optimization': optimization,
'reagents': reagents
}
We create an agent loop, combining understanding, planning, validation and review into a single, unified flow. We use Codegen's reasoning-based consulting to analyze sequence correction and propose effective improvements for efficiency and parallel execution.
def generate_checklist(results):
md = "# 🔬 WET-LAB PROTOCOL CHECKLISTnn"
md += f"**Total Steps:** {len(results['schedule'])}n"
md += f"**Estimated Time:** {results['total_time']} min ({results['total_time']//60}h {results['total_time']%60}m)n"
md += f"**Optimized Time:** {results['optimized_time']} min (save {results['time_saved']} min)nn"
md += "## ⏱️ TIMELINEn"
current_day = 1
for item in results['schedule']:
if item['day'] > current_day:
md += f"n### Day {item['day']}n"
current_day = item['day']
parallel = " 🔄" if item['can_parallelize'] else ""
md += f"- [ ] **{item['start']}-{item['end']}** | Step {item['step']}: {item['name']} ({item['temp']}){parallel}n"
md += "n## 🧪 REAGENT PICK-LISTn"
for reagent in results['reagents']:
md += f"- [ ] {reagent}n"
md += "n## ⚠️ SAFETY & INVENTORY ALERTSn"
all_issues = results['safety_risks'] + results['inventory_issues']
if all_issues:
for risk in all_issues:
md += f"- {risk}n"
else:
md += "- ✅ No critical issues detectedn"
md += "n## ✨ OPTIMIZATION TIPSn"
for tip in results['parallelization']:
md += f"- {tip}n"
md += f"- 💡 AI Suggestion: {results['ai_optimization']}n"
return md
def generate_gantt_csv(schedule):
df = pd.DataFrame(schedule)
return df.to_csv(index=False)
We create output generators that convert results into human-readable markdown and corresponding CSV files. We ensure that all executions present clear summaries of reagents, savings, and safety or innovation alerts for scheduled lab operations.
SAMPLE_PROTOCOL = """ELISA Protocol for Cytokine Detection
1. Coating (Day 1, 4°C overnight)
- Dilute capture antibody to 2 μg/mL in coating buffer (pH 9.6)
- Add 100 μL per well to 96-well plate
- Incubate at 4°C overnight (12-16 hours)
- BSL-2 cabinet required
2. Blocking (Day 2)
- Wash plate 3× with PBS-T (200 μL/well)
- Add 200 μL blocking buffer (1% BSA in PBS)
- Incubate 1 hour at room temperature
3. Sample Incubation
- Wash 3× with PBS-T
- Add 100 μL diluted samples/standards
- Incubate 2 hours at room temperature
4. Detection Antibody
- Wash 5× with PBS-T
- Add 100 μL biotinylated detection antibody (0.5 μg/mL)
- Incubate 1 hour at room temperature
5. Streptavidin-HRP
- Wash 5× with PBS-T
- Add 100 μL streptavidin-HRP (1:1000 dilution)
- Incubate 30 minutes at room temperature
- Work in dark
6. Development
- Wash 7× with PBS-T
- Add 100 μL TMB substrate
- Incubate 10-15 minutes (monitor color development)
- Add 50 μL stop solution (2M H2SO4) - CAUTION: corrosive
"""
SAMPLE_INVENTORY = """reagent,quantity,unit,expiry,lot
capture antibody,500,μg,2025-12-31,AB123
blocking buffer,500,mL,2025-11-30,BB456
PBS-T,1000,mL,2026-01-15,PT789
detection antibody,8,μg,2025-10-15,DA321
streptavidin HRP,10,mL,2025-12-01,SH654
TMB substrate,100,mL,2025-11-20,TM987
stop solution,250,mL,2026-03-01,SS147
BSA,100,g,2024-09-30,BS741"""
results = agent_loop(SAMPLE_PROTOCOL, SAMPLE_INVENTORY, start_time="09:00")
print("n" + "="*70)
print(generate_checklist(results))
print("n" + "="*70)
print("n📊 GANTT CSV (first 400 chars):n")
print(generate_gantt_csv(results['schedule'])[:400])
print("n🎯 Time Savings:", f"{results['time_saved']} minutes via parallelization")
We are conducting an extensive trial using the Elisa Protocol for the sample and Reagent Inventory Daset. We visualize the results of the agent, the optimized system, the benefits of matching, and the suggested development of AI, showing how our programming works as an Autonomous Lab Assistant.
Finally, we demonstrated how Agentic AI principles can increase efficiency and safety in wet lab operations. By converting free test form text into structured, usable techniques, we ensure automated validation, realent management, and temporal optimization in a single pipeline. Codegen's integration enables resource consultation about bottlenecks and security scenarios, enabling self-sustaining, secure data operations. We conclude with a fully functional scheduler that creates compatible GantT schedules, test Morkdowns, and tips for leveraging the power of AI, establishing a powerful foundation for autonomous onsite scheduling systems.
Look Full codes here. Feel free to take a look at ours GitHub page for tutorials, code and notebooks. Also, feel free to follow us Kind of stubborn and don't forget to join ours 100K + ML Subreddit and sign up Our newsletter. Wait! Do you telegraph? Now you can join us by telegraph.
AsifAzzaq is the CEO of MarktechPost Media Inc.. as a visionary entrepreneur and developer, Asifi is committed to harnessing the power of social intelligence for good. His latest effort is the launch of the intelligence media platform, MarktechPpost, which stands out for its deep understanding of machine learning and deep learning stories that are technically sound and easily understood by a wide audience. The platform sticks to more than two million monthly views, which shows its popularity among the audience.
Follow Marktechpost: Add us as a favorite source on Google.



