An Approach to Designing an Advanced Inference Tree Reasoning Agent with Beam Search, Heuristic Scoring, and Depth-Limited Pruning

nimda March 5, 2026

0 6 6 minutes read

An Approach to Designing an Advanced Inference Tree Reasoning Agent with Beam Search, Heuristic Scoring, and Depth-Limited Pruning

In this tutorial, we build an advanced multi-branch Tree-of-Thoughts (ToT) reasoning agent from scratch. Instead of relying on sequential reasoning, we design a system that generates multiple branches of reasoning, scores each branch using a heuristic evaluation function, prunes weak candidates, and continues to expand only the strongest paths. We combine the instruction-configured transformer model with a custom tree structure and use a choice of beam search style and depth-restricted search. By setting up the system in the domain of 24 games, we create a clear, objective benchmark where we can see branch expansion, pruning, scoring, and goal detection in action.

!pip -q install -U transformers accelerate sentencepiece


import re
import math
from dataclasses import dataclass, field
from typing import List, Optional, Tuple, Dict, Any


import torch
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM


MODEL_NAME = "google/flan-t5-base"
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModelForSeq2SeqLM.from_pretrained(MODEL_NAME)


device = "cuda" if torch.cuda.is_available() else "cpu"
model = model.to(device)


print("Device:", device)
print("Model loaded:", MODEL_NAME)


@dataclass
class Node:
   depth: int
   numbers: List[float]
   exprs: List[str]
   thought: str = ""
   score: float = -1e9
   is_goal: bool = False
   parent: Optional["Node"] = None
   meta: Dict[str, Any] = field(default_factory=dict)


def pretty_state(nums: List[float], exprs: List[str]) -> str:
   pairs = [f"{e}={n:g}" for e, n in zip(exprs, nums)]
   return " | ".join(pairs)

We install the necessary libraries and load the FLN-T5 model using the appropriate Seq2Seq architecture. We define our main Node data structure that represents each thought state in the Thought Tree search. We also implement device configuration and assistant utilities that enable us to print clearly and check the imaging status.

OPS = ["+", "-", "*", "/"]


def safe_apply(a: float, b: float, op: str) -> Optional[float]:
   if op == "+": return a + b
   if op == "-": return a - b
   if op == "*": return a * b
   if op == "/":
       if abs(b) < 1e-12:
           return None
       return a / b
   return None


def combine_expr(ea: str, eb: str, op: str) -> str:
   return f"({ea} {op} {eb})"


def is_24(x: float, tol: float = 1e-6) -> bool:
   return abs(x - 24.0) <= tol


def one_step_closeness(nums: List[float]) -> float:
   if len(nums) == 1:
       return abs(nums[0] - 24.0)
   best = float("inf")
   n = len(nums)
   for i in range(n):
       for j in range(n):
           if i == j:
               continue
           a, b = nums[i], nums[j]
           for op in OPS:
               r = safe_apply(a, b, op)
               if r is None:
                   continue
               best = min(best, abs(r - 24.0))
   return best if best != float("inf") else 1e9


def heuristic_score(node: Node) -> float:
   nums = node.numbers
   base = -one_step_closeness(nums)
   depth_penalty = 0.05 * node.depth
   exact_bonus = 2.0 if any(is_24(x) for x in nums) else 0.0
   return base - depth_penalty + exact_bonus

We use the mathematical logic of the 24 game domain. We describe a safe user interface, an expression construct, a goal check, and a scoring function that estimates how close a country is to the 24th goal. We design a heuristic to guide the search intelligently while penalizing deep branches.

PROPOSER_PROMPT = """You are helping solve the 24 game.
We have current items, each item has an expression and its numeric value.
Pick TWO items and combine them with one operation from + - * / to create a new item.
Return between {k} and {k2} suggestions as lines using EXACT format:


i,j,op


Where i and j are 0-based indices into the list. Use i != j. Prefer moves that help reach 24.


Current items:
{items}
"""


def llm_generate_suggestions(items: str, k_min: int, k_max: int, max_new_tokens: int = 160) -> str:
   prompt = PROPOSER_PROMPT.format(k=k_min, k2=k_max, items=items)
   inputs = tokenizer(prompt, return_tensors="pt", truncation=True).to(device)
   with torch.no_grad():
       out = model.generate(
           **inputs,
           max_new_tokens=max_new_tokens,
           do_sample=True,
           temperature=0.8,
           top_p=0.92,
           num_return_sequences=1,
       )
   txt = tokenizer.decode(out[0], skip_special_tokens=True)
   return txt.strip()


def parse_moves(text: str, n_items: int) -> List[Tuple[int, int, str]]:
   moves = []
   for line in text.splitlines():
       line = line.strip()
       m = re.match(r"^s*(d+)s*,s*(d+)s*,s*([+-*/])s*$", line)
       if not m:
           continue
       i, j, op = int(m.group(1)), int(m.group(2)), m.group(3)
       if 0 <= i < n_items and 0 <= j < n_items and i != j:
           moves.append((i, j, op))
   seen = set()
   uniq = []
   for mv in moves:
       if mv not in seen:
           uniq.append(mv)
           seen.add(mv)
   return uniq


def fallback_moves(nums: List[float], limit: int = 24) -> List[Tuple[int, int, str]]:
   scored = []
   n = len(nums)
   for i in range(n):
       for j in range(n):
           if i == j:
               continue
           for op in OPS:
               r = safe_apply(nums[i], nums[j], op)
               if r is None:
                   continue
               scored.append((abs(r - 24.0), i, j, op))
   scored.sort(key=lambda x: x[0])
   out = [(i, j, op) for _, i, j, op in scored[:limit]]
   seen, uniq = set(), []
   for mv in out:
       if mv not in seen:
           uniq.append(mv)
           seen.add(mv)
   return uniq

We develop an LLM proposal that generates multiple branches of thought. We format the information carefully so that the model returns structured aggregate functions and separates the output into actionable motions. We also use a deterministic backtracking strategy to ensure that the search remains robust even if the output model is noisy.

def apply_move(node: Node, i: int, j: int, op: str) -> Optional[Node]:
   nums = node.numbers[:]
   exprs = node.exprs[:]


   a, b = nums[i], nums[j]
   r = safe_apply(a, b, op)
   if r is None:
       return None


   ea, eb = exprs[i], exprs[j]
   new_expr = combine_expr(ea, eb, op)


   for idx in sorted([i, j], reverse=True):
       nums.pop(idx)
       exprs.pop(idx)


   nums.append(r)
   exprs.append(new_expr)


   child = Node(
       depth=node.depth + 1,
       numbers=nums,
       exprs=exprs,
       parent=node,
       thought=f"Combine item {i} and {j} with '{op}' -> {new_expr} = {r:g}",
   )
   child.is_goal = (len(nums) == 1 and is_24(nums[0]))
   child.score = heuristic_score(child)
   return child


def expand(node: Node, branch_factor: int, proposer_kmin: int = 8, proposer_kmax: int = 14) -> List[Node]:
   items_str = "n".join([f"{idx}: {node.exprs[idx]} = {node.numbers[idx]:g}" for idx in range(len(node.numbers))])


   raw = llm_generate_suggestions(items_str, proposer_kmin, proposer_kmax)
   moves = parse_moves(raw, len(node.numbers))


   if not moves:
       moves = fallback_moves(node.numbers, limit=30)


   moves = moves[: max(branch_factor * 2, branch_factor)]


   children = []
   for (i, j, op) in moves:
       ch = apply_move(node, i, j, op)
       if ch is not None:
           children.append(ch)


   children.sort(key=lambda x: x.score, reverse=True)
   return children[:branch_factor]

We use the Tree-of-Thoughts algorithm branching method. We use the proposed move to create new child nodes and calculate their heuristic scores. Then we prune away the weak branches, keeping only the strongest individuals to continue the experiment.

def reconstruct_solution(goal: Node) -> List[str]:
   path = []
   cur = goal
   while cur is not None:
       if cur.thought:
           path.append(cur.thought)
       cur = cur.parent
   return list(reversed(path))


def tot_solve_24(
   start_nums: List[int],
   beam_width: int = 10,
   branch_factor: int = 8,
   max_depth: int = 3,
   prune_threshold: float = -10.0,
   verbose: bool = True
) -> Dict[str, Any]:
   root = Node(
       depth=0,
       numbers=[float(x) for x in start_nums],
       exprs=[str(x) for x in start_nums],
   )
   root.score = heuristic_score(root)


   beam = [root]
   best_seen = root


   if verbose:
       print("n=== ToT Search Start ===")
       print("Start:", pretty_state(root.numbers, root.exprs))
       print("Root score:", root.score)


   for d in range(max_depth):
       candidates: List[Node] = []


       if verbose:
           print(f"n--- Depth {d} -> {d+1} expansion ---")
           print("Beam states:")
           for bidx, b in enumerate(beam[: min(len(beam), 6)]):
               print(f"  [{bidx}] score={b.score:.3f} | {pretty_state(b.numbers, b.exprs)}")


       for b in beam:
           kids = expand(b, branch_factor=branch_factor)
           candidates.extend(kids)


       if not candidates:
           break


       candidates = [c for c in candidates if c.score >= prune_threshold]


       goals = [c for c in candidates if c.is_goal]
       if goals:
           goals.sort(key=lambda x: x.score, reverse=True)
           sol = goals[0]
           steps = reconstruct_solution(sol)
           return {
               "solved": True,
               "start": start_nums,
               "expression": sol.exprs[0],
               "value": sol.numbers[0],
               "steps": steps,
               "final_score": sol.score
           }


       candidates.sort(key=lambda x: x.score, reverse=True)
       beam = candidates[:beam_width]


       if beam and beam[0].score > best_seen.score:
           best_seen = beam[0]


       if verbose:
           print("Top candidates after pruning/beam:")
           for cidx, c in enumerate(beam[: min(len(beam), 6)]):
               print(f"  [{cidx}] score={c.score:.3f} | {pretty_state(c.numbers, c.exprs)}")


   best_expr = best_seen.exprs[0] if len(best_seen.exprs) == 1 else " ; ".join(best_seen.exprs)
   best_val = best_seen.numbers[0] if len(best_seen.numbers) == 1 else None
   return {
       "solved": False,
       "start": start_nums,
       "best_state": pretty_state(best_seen.numbers, best_seen.exprs),
       "best_expression": best_expr,
       "best_value": best_val,
       "final_score": best_seen.score,
       "note": "Not solved within depth/beam limits; increase beam_width/branch_factor or adjust pruning."
   }


tests = [
   [4, 1, 8, 7],
   [3, 3, 8, 8],
   [6, 6, 6, 6],
   [9, 9, 4, 4],
]


for nums in tests:
   result = tot_solve_24(
       nums,
       beam_width=12,
       branch_factor=10,
       max_depth=3,
       prune_threshold=-12.0,
       verbose=True
   )
   print("n=== RESULT ===")
   for k, v in result.items():
       if k == "steps":
           print("steps:")
           for s in v:
               print("  -", s)
       else:
           print(f"{k}: {v}")
   print("n" + "="*80 + "n")


print("""
To adapt this ToT agent beyond the 24 game:
1) Define a STATE representation (like numbers/exprs here).
2) Define a PROPOSER that generates candidate next steps (LLM tool or rule-based).
3) Define a HEURISTIC / SCORER:
  - for checkable tasks, use objective scoring
  - for open-ended tasks, use an LLM-critic scoring rubric
4) Run the same ToT loop:
  expand -> score -> prune -> keep top beam -> repeat until goal or depth limit.
""")

We implement a full Tree-of-Thoughts search loop using beam and depth search constraints. We expand, score, prune, and select top branches at each depth until we reach a solution or exhaust the search budget. Finally, we reconstruct the logic and show how the agent solves multiple scenarios of the 24 game step by step.

In conclusion, we have built a complete multi-branched reasoning agent that demonstrates how Tree-of-Thoughts transforms LLM reasoning from a one-way to a systematic search process. We implemented branching, heuristic scoring, pruning, beam selection, and depth control in a modular design that can be easily adapted to other logic problems. In this tutorial, we saw how combining linguistic models with search algorithms improves systematic decision making. We now have a reusable ToT framework that we can transfer to mathematical reasoning, programming tasks, symbolic search, or even LLM-based assessment programs.

Check it out Full Codes here. Also, feel free to follow us Twitter and don't forget to join our 120k+ ML SubReddit and Subscribe to Our newspaper. Wait! are you on telegram? now you can join us on telegram too.

Source link

nimda March 5, 2026

0 6 6 minutes read