A Complete Workflow for Rapid Auto-Development Using Gemini Flash, Multiple Shot Selection, and Natural Command Search

In this course, we move from simple art to a more structured, editable approach by treating information as readable parameters rather than static text. Instead of guessing which command or example works best, we built an optimization loop around Gemini 2.0 Flash that checks, evaluates, and automatically selects the most robust data configuration. In this implementation, we watch our model evolve step by step, showing how powerful rapid engineering becomes when we program it through data-driven search instead of practice. Check it out Full Codes here.
import google.generativeai as genai
import json
import random
from typing import List, Dict, Tuple, Optional
from dataclasses import dataclass
import numpy as np
from collections import Counter
def setup_gemini(api_key: str = None):
if api_key is None:
api_key = input("Enter your Gemini API key: ").strip()
genai.configure(api_key=api_key)
model = genai.GenerativeModel('gemini-2.0-flash-exp')
print("✓ Gemini 2.0 Flash configured")
return model
@dataclass
class Example:
text: str
sentiment: str
def to_dict(self):
return {"text": self.text, "sentiment": self.sentiment}
@dataclass
class Prediction:
sentiment: str
reasoning: str = ""
confidence: float = 1.0
We import all required libraries and define the setup_gemini wizard to configure Gemini 2.0 Flash. We also create Model and Predictor data classes to represent dataset inputs and model outputs in a clean, structured way. Check it out Full Codes here.
def create_dataset() -> Tuple[List[Example], List[Example]]:
train_data = [
Example("This movie was absolutely fantastic! Best film of the year.", "positive"),
Example("Terrible experience, waste of time and money.", "negative"),
Example("The product works as expected, nothing special.", "neutral"),
Example("I'm blown away by the quality and attention to detail!", "positive"),
Example("Disappointing and overpriced. Would not recommend.", "negative"),
Example("It's okay, does the job but could be better.", "neutral"),
Example("Incredible customer service and amazing results!", "positive"),
Example("Complete garbage, broke after one use.", "negative"),
Example("Average product, met my basic expectations.", "neutral"),
Example("Revolutionary! This changed everything for me.", "positive"),
Example("Frustrating bugs and poor design choices.", "negative"),
Example("Decent quality for the price point.", "neutral"),
Example("Exceeded all my expectations, truly remarkable!", "positive"),
Example("Worst purchase I've ever made, avoid at all costs.", "negative"),
Example("It's fine, nothing to complain about really.", "neutral"),
Example("Absolutely stellar performance, 5 stars!", "positive"),
Example("Broken and unusable, total disaster.", "negative"),
Example("Meets requirements, standard quality.", "neutral"),
]
val_data = [
Example("Absolutely love it, couldn't be happier!", "positive"),
Example("Broken on arrival, very upset.", "negative"),
Example("Works fine, no major issues.", "neutral"),
Example("Outstanding performance and great value!", "positive"),
Example("Regret buying this, total letdown.", "negative"),
Example("Adequate for basic use.", "neutral"),
]
return train_data, val_data
class PromptTemplate:
def __init__(self, instruction: str = "", examples: List[Example] = None):
self.instruction = instruction
self.examples = examples or []
def format(self, text: str) -> str:
prompt_parts = []
if self.instruction:
prompt_parts.append(self.instruction)
if self.examples:
prompt_parts.append("nExamples:")
for ex in self.examples:
prompt_parts.append(f"nText: {ex.text}")
prompt_parts.append(f"Sentiment: {ex.sentiment}")
prompt_parts.append(f"nText: {text}")
prompt_parts.append("Sentiment:")
return "n".join(prompt_parts)
def clone(self):
return PromptTemplate(self.instruction, self.examples.copy())
We create a small but diverse sentiment dataset for training and validation using the create_dataset function. We then define a PromptTemplate, which allows us to combine instructions, several examples, and the current question into a single prompt string. We treat the template as editable so we can exchange instructions and examples during development. Check it out Full Codes here.
class SentimentModel:
def __init__(self, model, prompt_template: PromptTemplate):
self.model = model
self.prompt_template = prompt_template
def predict(self, text: str) -> Prediction:
prompt = self.prompt_template.format(text)
try:
response = self.model.generate_content(prompt)
result = response.text.strip().lower()
for sentiment in ['positive', 'negative', 'neutral']:
if sentiment in result:
return Prediction(sentiment=sentiment, reasoning=result)
return Prediction(sentiment="neutral", reasoning=result)
except Exception as e:
return Prediction(sentiment="neutral", reasoning=str(e))
def evaluate(self, dataset: List[Example]) -> float:
correct = 0
for example in dataset:
pred = self.predict(example.text)
if pred.sentiment == example.sentiment:
correct += 1
return (correct / len(dataset)) * 100
We wrap Gemini in the SentimentModel class to call it a general classifier. We format the information with a template, call generate_content, and after processing the text to extract one of the three emotions. We also add a test method so we can measure accuracy on any dataset with a single call. Check it out Full Codes here.
class PromptOptimizer:
def __init__(self, model):
self.model = model
self.instruction_candidates = [
"Analyze the sentiment of the following text. Classify as positive, negative, or neutral.",
"Classify the sentiment: positive, negative, or neutral.",
"Determine if this text expresses positive, negative, or neutral sentiment.",
"What is the emotional tone? Answer: positive, negative, or neutral.",
"Sentiment classification (positive/negative/neutral):",
"Evaluate sentiment and respond with exactly one word: positive, negative, or neutral.",
]
def select_best_examples(self, train_data: List[Example], val_data: List[Example], n_examples: int = 3) -> List[Example]:
best_examples = None
best_score = 0
for _ in range(10):
examples_by_sentiment = {
'positive': [e for e in train_data if e.sentiment == 'positive'],
'negative': [e for e in train_data if e.sentiment == 'negative'],
'neutral': [e for e in train_data if e.sentiment == 'neutral']
}
selected = []
for sentiment in ['positive', 'negative', 'neutral']:
if examples_by_sentiment[sentiment]:
selected.append(random.choice(examples_by_sentiment[sentiment]))
remaining = [e for e in train_data if e not in selected]
while len(selected) < n_examples and remaining:
selected.append(random.choice(remaining))
remaining.remove(selected[-1])
template = PromptTemplate(instruction=self.instruction_candidates[0], examples=selected)
test_model = SentimentModel(self.model, template)
score = test_model.evaluate(val_data[:3])
if score > best_score:
best_score = score
best_examples = selected
return best_examples
def optimize_instruction(self, examples: List[Example], val_data: List[Example]) -> str:
best_instruction = self.instruction_candidates[0]
best_score = 0
for instruction in self.instruction_candidates:
template = PromptTemplate(instruction=instruction, examples=examples)
test_model = SentimentModel(self.model, template)
score = test_model.evaluate(val_data)
if score > best_score:
best_score = score
best_instruction = instruction
return best_instruction
We introduce the PromptOptimizer class and define a number of candidate prompts to test. We use select_best_examples to search for a small, diverse set of examples and optimize_instruction to optimize each variation of the instruction in the validation data. We successfully turn a quick design into a lightweight search problem with examples and instructions. Check it out Full Codes here.
def compile(self, train_data: List[Example], val_data: List[Example], n_examples: int = 3) -> PromptTemplate:
best_examples = self.select_best_examples(train_data, val_data, n_examples)
best_instruction = self.optimize_instruction(best_examples, val_data)
optimized_template = PromptTemplate(instruction=best_instruction, examples=best_examples)
return optimized_template
def main():
print("="*70)
print("Prompt Optimization Tutorial")
print("Stop Writing Prompts, Start Programming Them!")
print("="*70)
model = setup_gemini()
train_data, val_data = create_dataset()
print(f"✓ {len(train_data)} training examples, {len(val_data)} validation examples")
baseline_template = PromptTemplate(
instruction="Classify sentiment as positive, negative, or neutral.",
examples=[]
)
baseline_model = SentimentModel(model, baseline_template)
baseline_score = baseline_model.evaluate(val_data)
manual_examples = train_data[:3]
manual_template = PromptTemplate(
instruction="Classify sentiment as positive, negative, or neutral.",
examples=manual_examples
)
manual_model = SentimentModel(model, manual_template)
manual_score = manual_model.evaluate(val_data)
optimizer = PromptOptimizer(model)
optimized_template = optimizer.compile(train_data, val_data, n_examples=4)
We're adding an aggregation method to combine the best examples and instructions into the final well-made PromptTemplate. In the main, we configure Gemini, build a dataset, and test both the zero-summation base and a few handwritten data. We then call the optimizer to generate our aggregated data, optimized for sentiment analysis. Check it out Full Codes here.
optimized_model = SentimentModel(model, optimized_template)
optimized_score = optimized_model.evaluate(val_data)
print(f"Baseline (zero-shot): {baseline_score:.1f}%")
print(f"Manual few-shot: {manual_score:.1f}%")
print(f"Optimized (compiled): {optimized_score:.1f}%")
print(f"nInstruction: {optimized_template.instruction}")
print(f"nSelected Examples ({len(optimized_template.examples)}):")
for i, ex in enumerate(optimized_template.examples, 1):
print(f"n{i}. Text: {ex.text}")
print(f" Sentiment: {ex.sentiment}")
test_cases = [
"This is absolutely amazing, I love it!",
"Completely broken and unusable.",
"It works as advertised, no complaints."
]
for test_text in test_cases:
print(f"nInput: {test_text}")
pred = optimized_model.predict(test_text)
print(f"Predicted: {pred.sentiment}")
print("✓ Tutorial Complete!")
if __name__ == "__main__":
main()
We test the improved model and compare its accuracy against the baseline and a few manual shots setup. We print selected instructions and selected examples so we can test what the optimizer finds, then run a few live test sentences to see the predictions in action. We conclude by summarizing the development and reinforcing the idea that information can be tuned programmatically rather than written by hand.
In conclusion, we have used a method for rapid configuration of systems that provides an iterative, evidence-driven workflow for designing high-performance commands. We started with a fragile foundation, then repeatedly tested the instructions, selected various examples, and assembled an improved template that surpassed manual efforts. This process shows that we are no longer dependent on trial and error; instead, we planned a controlled development cycle. And, we can expand this pipeline to new tasks, richer data sets, and more advanced scoring methods, allowing us to inform developers with accuracy, confidence, and scalability.
Check it out Full Codes here. Feel free to check out our GitHub page for Tutorials, Codes and Notebooks. Also, feel free to follow us Twitter and don't forget to join our 100k+ ML SubReddit and Subscribe to Our newspaper. Wait! are you on telegram? now you can join us on telegram too.
Asif Razzaq is the CEO of Marktechpost Media Inc. As a visionary entrepreneur and engineer, Asif is committed to harnessing the power of Artificial Intelligence for the benefit of society. His latest endeavor is the launch of Artificial Intelligence Media Platform, Marktechpost, which stands out for its extensive coverage of machine learning and deep learning stories that sound technically sound and easily understood by a wide audience. The platform boasts of more than 2 million monthly views, which shows its popularity among the audience.



