How to create an advanced AI agent in summarizing the old memory and vector

nimda September 2, 2025

0 3 4 minutes read

How to create an advanced AI agent in summarizing the old memory and vector

In this lesson, we visit to develop an improved AI agent that can only talk but also remember. We start from the beginning and show how to combine a lack of light, FAiss Vector, and the Sufarization Mechanism to create a temporary and long-term memory. By working together with cleaning up and broken facts, we can compete an agidate our commands, and ensure the context, to ensure that communication is always smooth and efficient. Look Full codes here.

!pip -q install transformers accelerate bitsandbytes sentence-transformers faiss-cpu


import os, json, time, uuid, math, re
from datetime import datetime
import torch, faiss
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline, BitsAndBytesConfig
from sentence_transformers import SentenceTransformer
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"

We begin by installing important libraries and entrails all the modules needed for our agel. We also set up the environment for us to use GPU or CPU, allow us to function properly. Look Full codes here.

def load_llm(model_name="TinyLlama/TinyLlama-1.1B-Chat-v1.0"):
   try:
       if DEVICE=="cuda":
           bnb=BitsAndBytesConfig(load_in_4bit=True,bnb_4bit_compute_dtype=torch.bfloat16,bnb_4bit_quant_type="nf4")
           tok=AutoTokenizer.from_pretrained(model_name, use_fast=True)
           mdl=AutoModelForCausalLM.from_pretrained(model_name, quantization_config=bnb, device_map="auto")
       else:
           tok=AutoTokenizer.from_pretrained(model_name, use_fast=True)
           mdl=AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32, low_cpu_mem_usage=True)
       return pipeline("text-generation", model=mdl, tokenizer=tok, device=0 if DEVICE=="cuda" else -1, do_sample=True)
   except Exception as e:
       raise RuntimeError(f"Failed to load LLM: {e}")

We describe the function of uploading our language model. We put it up to the top to be there to exist, using a 4-bit measure to work properly; Besides, we go back to CPU with well made arrangements. This ensures that we can create a text clearly regardless of how hardware is running. Look Full codes here.

class VectorMemory:
   def __init__(self, path="/content/agent_memory.json", dim=384):
       self.path=path; self.dim=dim; self.items=[]
       self.embedder=SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2", device=DEVICE)
       self.index=faiss.IndexFlatIP(dim)
       if os.path.exists(path):
           data=json.load(open(path))
           self.items=data.get("items",[])
           if self.items:
               X=torch.tensor([x["emb"] for x in self.items], dtype=torch.float32).numpy()
               self.index.add(X)
   def _emb(self, text):
       v=self.embedder.encode([text], normalize_embeddings=True)[0]
       return v.tolist()
   def add(self, text, meta=None):
       e=self._emb(text); self.index.add(torch.tensor([e]).numpy())
       rec={"id":str(uuid.uuid4()),"text":text,"meta":meta or {}, "emb":e}
       self.items.append(rec); self._save(); return rec["id"]
   def search(self, query, k=5, thresh=0.25):
       if len(self.items)==0: return []
       q=self.embedder.encode([query], normalize_embeddings=True)
       D,I=self.index.search(q, min(k, len(self.items)))
       out=[]
       for d,i in zip(D[0],I[0]):
           if i==-1: continue
           if d>=thresh: out.append((d,self.items[i]))
       return out
   def _save(self):
       slim=[{k:v for k,v in it.items()} for it in self.items]
       json.dump({"items":slim}, open(self.path,"w"), indent=2)

We create a vectorMemory phase that gives our long-term memory of agent. We keep the past working together as a embedd and pointing to the faiss, allowing us to search and remember the relevant information later. Each memory is stored in disk, enables the agent to keep its memory in all times. Look Full codes here.

def now_iso(): return datetime.now().isoformat(timespec="seconds")
def clamp(txt, n=1600): return txt if len(txt)<=n else txt[:n]+" …"
def strip_json(s):
   m=re.search(r"{.*}", s, flags=re.S);
   return m.group(0) if m else None


SYS_GUIDE = (
"You are a helpful, concise assistant with memory. Use provided MEMORY when relevant. "
"Prefer facts from MEMORY over guesses. Answer directly; keep code blocks tight. If unsure, say so."
)


SUMMARIZE_PROMPT = lambda convo: f"Summarize the conversation below in 4-6 bullet points focusing on stable facts and tasks:nn{convo}nnSummary:"
DISTILL_PROMPT = lambda user: (
f"""Decide if the USER text contains durable info worth long-term memory (preferences, identity, projects, deadlines, facts).
Return compact JSON only: {{"save": true/false, "memory": "one-sentence memory"}}.
USER: {user}""")


class MemoryAgent:
   def __init__(self):
       self.llm=load_llm()
       self.mem=VectorMemory()
       self.turns=[]    
       self.summary=""   
       self.max_turns=10
   def _gen(self, prompt, max_new_tokens=256, temp=0.7):
       out=self.llm(prompt, max_new_tokens=max_new_tokens, temperature=temp, top_p=0.95, num_return_sequences=1, pad_token_id=self.llm.tokenizer.eos_token_id)[0]["generated_text"]
       return out[len(prompt):].strip() if out.startswith(prompt) else out.strip()
   def _chat_prompt(self, user, memory_context):
       convo="n".join([f"{r.upper()}: {t}" for r,t in self.turns[-8:]])
       sys=f"System: {SYS_GUIDE}nTime: {now_iso()}nn"
       mem = f"MEMORY (relevant excerpts):n{memory_context}nn" if memory_context else ""
       summ=f"CONTEXT SUMMARY:n{self.summary}nn" if self.summary else ""
       return sys+mem+summ+convo+f"nUSER: {user}nASSISTANT:"
   def _distill_and_store(self, user):
       try:
           raw=self._gen(DISTILL_PROMPT(user), max_new_tokens=120, temp=0.1)
           js=strip_json(raw)
           if js:
               obj=json.loads(js)
               if obj.get("save") and obj.get("memory"):
                   self.mem.add(obj["memory"], {"ts":now_iso(),"source":"distilled"})
                   return True, obj["memory"]
       except Exception: pass
       if re.search(r"b(my name is|call me|I like|deadline|due|email|phone|working on|prefer|timezone|birthday|goal|exam)b", user, flags=re.I):
           m=f"User said: {clamp(user,120)}"
           self.mem.add(m, {"ts":now_iso(),"source":"heuristic"})
           return True, m
       return False, ""
   def _maybe_summarize(self):
       if len(self.turns)>self.max_turns:
           convo="n".join([f"{r}: {t}" for r,t in self.turns])
           s=self._gen(SUMMARIZE_PROMPT(clamp(convo, 3500)), max_new_tokens=180, temp=0.2)
           self.summary=s; self.turns=self.turns[-4:]
   def recall(self, query, k=5):
       hits=self.mem.search(query, k=k)
       return "n".join([f"- ({d:.2f}) {h['text']} [meta={h['meta']}]" for d,h in hits])
   def ask(self, user):
       self.turns.append(("user", user))
       saved, memline = self._distill_and_store(user)
       mem_ctx=self.recall(user, k=6)
       prompt=self._chat_prompt(user, mem_ctx)
       reply=self._gen(prompt)
       self.turns.append(("assistant", reply))
       self._maybe_summarize()
       status=f"💾 memory_saved: {saved}; " + (f"note: {memline}" if saved else "note: -")
       print(f"nUSER: {user}nASSISTANT: {reply}n{status}")
       return reply

We bring everything together in the MempleAgent section. We design a agent to produce the formators of a state, facts in long-term memory, and occasionally summarize negotiations to handle the temporary context. With this tip, we make a recalled helper, you remember, and agree with our interaction. Look Full codes here.

agent=MemoryAgent()


print("✅ Agent ready. Try these:n")
agent.ask("Hi! My name is Nicolaus, I prefer being called Nik. I'm preparing for UPSC in 2027.")
agent.ask("Also, I work at  Visa in analytics and love concise answers.")
agent.ask("What's my exam year and how should you address me next time?")
agent.ask("Reminder: I like agentic RAG tutorials with single-file Colab code.")
agent.ask("Given my prefs, suggest a study focus for this week in one paragraph.")

We take our memory and use it quickly with a few messages for long-term seed and ensure remembering. We assure you that we remember our popular name and the year of Exam, agree with the answers to our short stadium, and use the previous Preferences (Agentic Rag, one File Colab) to guide the supervision of the study yet.

In conclusion, we see how strong it is when we give ai agent to remember. We now have an agent that keeps important information, remembers it when it is appropriate, and becomes a struggle to maintain work. This method keeps our meeting and appearing, making an agent feel alone and smart about each exchange. For this basis, we are ready to extend the memory continuously, check developed schemes, and test the prompt-up-enabled memory agent.

Look Full codes here. Feel free to look our GITHUB page for tutorials, codes and letters of writing. Also, feel free to follow it Sane and don't forget to join ours 100K + ml subreddit Then sign up for Our newspaper.

Asphazzaq is a Markteach Media Inc. According to a View Business and Developer, Asifi is committed to integrating a good social intelligence. His latest attempt is launched by the launch of the chemistrylife plan for an intelligence, MarktechPost, a devastating intimate practice of a machine learning and deep learning issues that are clearly and easily understood. The platform is adhering to more than two million moon visits, indicating its popularity between the audience.

Source link

nimda September 2, 2025

0 3 4 minutes read