Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Eric Trump Goes to Battle With Large Banks Over ‘Anti‑American’ Crypto Lobbying

    March 6, 2026

    10 Reasons Why Resident Evil Requiem is the Scariest Entry in the Series

    March 6, 2026

    Combined Military Hospital CMH Mailsi Garrison Jobs 2026 2026 Job Advertisement Pakistan

    March 6, 2026
    Facebook X (Twitter) Instagram
    Friday, March 6
    Trending
    • Eric Trump Goes to Battle With Large Banks Over ‘Anti‑American’ Crypto Lobbying
    • 10 Reasons Why Resident Evil Requiem is the Scariest Entry in the Series
    • Combined Military Hospital CMH Mailsi Garrison Jobs 2026 2026 Job Advertisement Pakistan
    • 20 U.S. states sue to stop Trump’s new 15% global tariffs – National
    • President Zardari provides assent to invoice for extending NAB chairman’s time period by 3 years – Pakistan
    • India survive Bethell onslaught to e-book T20 World Cup closing spot
    • EXCLUSIVE: Luma launches creative AI agents powered by its new ‘Unified Intelligence’ models
    • Airlines hit by jet fuel price surge as Iran conflict disrupts global supply
    • Bitcoin Spot Demand Surges as Conflict Tensions Shake World Markets
    • Pokemon Pokopia Font Generator Meme Ruined By Stupid Fascist
    Facebook X (Twitter) Instagram Pinterest Vimeo
    The News92The News92
    • Home
    • World
    • National
    • Sports
    • Crypto
    • Travel
    • Lifestyle
    • Jobs
    • Insurance
    • Gaming
    • AI & Tech
    • Health & Fitness
    The News92The News92
    Home - AI & Tech - How to Design an Advanced Tree-of-Thoughts Multi-Branch Reasoning Agent with Beam Search, Heuristic Scoring, and Depth-Limited Pruning
    AI & Tech

    How to Design an Advanced Tree-of-Thoughts Multi-Branch Reasoning Agent with Beam Search, Heuristic Scoring, and Depth-Limited Pruning

    Naveed AhmadBy Naveed AhmadMarch 5, 2026No Comments8 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    In this tutorial, we build an advanced Tree-of-Thoughts (ToT) multi-branch reasoning agent from scratch. Instead of relying on linear chain-of-thought reasoning, we design a system that generates multiple reasoning branches, scores each branch using a heuristic evaluation function, prunes weak candidates, and continues expanding only the strongest paths. We combine an instruction-tuned transformer model with a custom tree structure and implement beam-search style selection with depth-limited search. By grounding the system in the 24-game domain, we create a clear, objective benchmark for reasoning where we can observe branch expansion, pruning, scoring, and goal detection in action.

    !pip -q install -U transformers accelerate sentencepiece
    
    
    import re
    import math
    from dataclasses import dataclass, field
    from typing import List, Optional, Tuple, Dict, Any
    
    
    import torch
    from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
    
    
    MODEL_NAME = "google/flan-t5-base"
    tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
    model = AutoModelForSeq2SeqLM.from_pretrained(MODEL_NAME)
    
    
    device = "cuda" if torch.cuda.is_available() else "cpu"
    model = model.to(device)
    
    
    print("Device:", device)
    print("Model loaded:", MODEL_NAME)
    
    
    @dataclass
    class Node:
       depth: int
       numbers: List[float]
       exprs: List[str]
       thought: str = ""
       score: float = -1e9
       is_goal: bool = False
       parent: Optional["Node"] = None
       meta: Dict[str, Any] = field(default_factory=dict)
    
    
    def pretty_state(nums: List[float], exprs: List[str]) -> str:
       pairs = [f"{e}={n:g}" for e, n in zip(exprs, nums)]
       return " | ".join(pairs)

    We install the required libraries and load the FLAN-T5 model using the correct Seq2Seq architecture. We define our core Node data structure that represents each reasoning state in the Tree-of-Thoughts search. We also initialize the device configuration and helper utilities that enable us to clearly print and inspect the reasoning state.

    OPS = ["+", "-", "*", "/"]
    
    
    def safe_apply(a: float, b: float, op: str) -> Optional[float]:
       if op == "+": return a + b
       if op == "-": return a - b
       if op == "*": return a * b
       if op == "/":
           if abs(b) < 1e-12:
               return None
           return a / b
       return None
    
    
    def combine_expr(ea: str, eb: str, op: str) -> str:
       return f"({ea} {op} {eb})"
    
    
    def is_24(x: float, tol: float = 1e-6) -> bool:
       return abs(x - 24.0) <= tol
    
    
    def one_step_closeness(nums: List[float]) -> float:
       if len(nums) == 1:
           return abs(nums[0] - 24.0)
       best = float("inf")
       n = len(nums)
       for i in range(n):
           for j in range(n):
               if i == j:
                   continue
               a, b = nums[i], nums[j]
               for op in OPS:
                   r = safe_apply(a, b, op)
                   if r is None:
                       continue
                   best = min(best, abs(r - 24.0))
       return best if best != float("inf") else 1e9
    
    
    def heuristic_score(node: Node) -> float:
       nums = node.numbers
       base = -one_step_closeness(nums)
       depth_penalty = 0.05 * node.depth
       exact_bonus = 2.0 if any(is_24(x) for x in nums) else 0.0
       return base - depth_penalty + exact_bonus

    We implement the mathematical logic of the 24-game domain. We define safe operator execution, expression construction, goal checking, and a heuristic scoring function that estimates how close a state is to the goal of 24. We design the heuristic to guide the search intelligently while penalizing deeper branches.

    PROPOSER_PROMPT = """You are helping solve the 24 game.
    We have current items, each item has an expression and its numeric value.
    Pick TWO items and combine them with one operation from + - * / to create a new item.
    Return between {k} and {k2} suggestions as lines using EXACT format:
    
    
    i,j,op
    
    
    Where i and j are 0-based indices into the list. Use i != j. Prefer moves that help reach 24.
    
    
    Current items:
    {items}
    """
    
    
    def llm_generate_suggestions(items: str, k_min: int, k_max: int, max_new_tokens: int = 160) -> str:
       prompt = PROPOSER_PROMPT.format(k=k_min, k2=k_max, items=items)
       inputs = tokenizer(prompt, return_tensors="pt", truncation=True).to(device)
       with torch.no_grad():
           out = model.generate(
               **inputs,
               max_new_tokens=max_new_tokens,
               do_sample=True,
               temperature=0.8,
               top_p=0.92,
               num_return_sequences=1,
           )
       txt = tokenizer.decode(out[0], skip_special_tokens=True)
       return txt.strip()
    
    
    def parse_moves(text: str, n_items: int) -> List[Tuple[int, int, str]]:
       moves = []
       for line in text.splitlines():
           line = line.strip()
           m = re.match(r"^\s*(\d+)\s*,\s*(\d+)\s*,\s*([\+\-\*\/])\s*$", line)
           if not m:
               continue
           i, j, op = int(m.group(1)), int(m.group(2)), m.group(3)
           if 0 <= i < n_items and 0 <= j < n_items and i != j:
               moves.append((i, j, op))
       seen = set()
       uniq = []
       for mv in moves:
           if mv not in seen:
               uniq.append(mv)
               seen.add(mv)
       return uniq
    
    
    def fallback_moves(nums: List[float], limit: int = 24) -> List[Tuple[int, int, str]]:
       scored = []
       n = len(nums)
       for i in range(n):
           for j in range(n):
               if i == j:
                   continue
               for op in OPS:
                   r = safe_apply(nums[i], nums[j], op)
                   if r is None:
                       continue
                   scored.append((abs(r - 24.0), i, j, op))
       scored.sort(key=lambda x: x[0])
       out = [(i, j, op) for _, i, j, op in scored[:limit]]
       seen, uniq = set(), []
       for mv in out:
           if mv not in seen:
               uniq.append(mv)
               seen.add(mv)
       return uniq

    We build the LLM proposer that generates multiple reasoning branches. We format the prompt carefully so the model returns structured combine operations and parse those outputs into executable moves. We also implement a deterministic fallback strategy to ensure the search remains robust even if the model output is noisy.

    def apply_move(node: Node, i: int, j: int, op: str) -> Optional[Node]:
       nums = node.numbers[:]
       exprs = node.exprs[:]
    
    
       a, b = nums[i], nums[j]
       r = safe_apply(a, b, op)
       if r is None:
           return None
    
    
       ea, eb = exprs[i], exprs[j]
       new_expr = combine_expr(ea, eb, op)
    
    
       for idx in sorted([i, j], reverse=True):
           nums.pop(idx)
           exprs.pop(idx)
    
    
       nums.append(r)
       exprs.append(new_expr)
    
    
       child = Node(
           depth=node.depth + 1,
           numbers=nums,
           exprs=exprs,
           parent=node,
           thought=f"Combine item {i} and {j} with '{op}' -> {new_expr} = {r:g}",
       )
       child.is_goal = (len(nums) == 1 and is_24(nums[0]))
       child.score = heuristic_score(child)
       return child
    
    
    def expand(node: Node, branch_factor: int, proposer_kmin: int = 8, proposer_kmax: int = 14) -> List[Node]:
       items_str = "\n".join([f"{idx}: {node.exprs[idx]} = {node.numbers[idx]:g}" for idx in range(len(node.numbers))])
    
    
       raw = llm_generate_suggestions(items_str, proposer_kmin, proposer_kmax)
       moves = parse_moves(raw, len(node.numbers))
    
    
       if not moves:
           moves = fallback_moves(node.numbers, limit=30)
    
    
       moves = moves[: max(branch_factor * 2, branch_factor)]
    
    
       children = []
       for (i, j, op) in moves:
           ch = apply_move(node, i, j, op)
           if ch is not None:
               children.append(ch)
    
    
       children.sort(key=lambda x: x.score, reverse=True)
       return children[:branch_factor]

    We implement the branch expansion mechanism of the Tree-of-Thoughts algorithm. We apply proposed moves to create new child nodes and compute their heuristic scores. We then locally prune weaker branches, retaining only the strongest candidates for further exploration.

    def reconstruct_solution(goal: Node) -> List[str]:
       path = []
       cur = goal
       while cur is not None:
           if cur.thought:
               path.append(cur.thought)
           cur = cur.parent
       return list(reversed(path))
    
    
    def tot_solve_24(
       start_nums: List[int],
       beam_width: int = 10,
       branch_factor: int = 8,
       max_depth: int = 3,
       prune_threshold: float = -10.0,
       verbose: bool = True
    ) -> Dict[str, Any]:
       root = Node(
           depth=0,
           numbers=[float(x) for x in start_nums],
           exprs=[str(x) for x in start_nums],
       )
       root.score = heuristic_score(root)
    
    
       beam = [root]
       best_seen = root
    
    
       if verbose:
           print("\n=== ToT Search Start ===")
           print("Start:", pretty_state(root.numbers, root.exprs))
           print("Root score:", root.score)
    
    
       for d in range(max_depth):
           candidates: List[Node] = []
    
    
           if verbose:
               print(f"\n--- Depth {d} -> {d+1} expansion ---")
               print("Beam states:")
               for bidx, b in enumerate(beam[: min(len(beam), 6)]):
                   print(f"  [{bidx}] score={b.score:.3f} | {pretty_state(b.numbers, b.exprs)}")
    
    
           for b in beam:
               kids = expand(b, branch_factor=branch_factor)
               candidates.extend(kids)
    
    
           if not candidates:
               break
    
    
           candidates = [c for c in candidates if c.score >= prune_threshold]
    
    
           goals = [c for c in candidates if c.is_goal]
           if goals:
               goals.sort(key=lambda x: x.score, reverse=True)
               sol = goals[0]
               steps = reconstruct_solution(sol)
               return {
                   "solved": True,
                   "start": start_nums,
                   "expression": sol.exprs[0],
                   "value": sol.numbers[0],
                   "steps": steps,
                   "final_score": sol.score
               }
    
    
           candidates.sort(key=lambda x: x.score, reverse=True)
           beam = candidates[:beam_width]
    
    
           if beam and beam[0].score > best_seen.score:
               best_seen = beam[0]
    
    
           if verbose:
               print("Top candidates after pruning/beam:")
               for cidx, c in enumerate(beam[: min(len(beam), 6)]):
                   print(f"  [{cidx}] score={c.score:.3f} | {pretty_state(c.numbers, c.exprs)}")
    
    
       best_expr = best_seen.exprs[0] if len(best_seen.exprs) == 1 else " ; ".join(best_seen.exprs)
       best_val = best_seen.numbers[0] if len(best_seen.numbers) == 1 else None
       return {
           "solved": False,
           "start": start_nums,
           "best_state": pretty_state(best_seen.numbers, best_seen.exprs),
           "best_expression": best_expr,
           "best_value": best_val,
           "final_score": best_seen.score,
           "note": "Not solved within depth/beam limits; increase beam_width/branch_factor or adjust pruning."
       }
    
    
    tests = [
       [4, 1, 8, 7],
       [3, 3, 8, 8],
       [6, 6, 6, 6],
       [9, 9, 4, 4],
    ]
    
    
    for nums in tests:
       result = tot_solve_24(
           nums,
           beam_width=12,
           branch_factor=10,
           max_depth=3,
           prune_threshold=-12.0,
           verbose=True
       )
       print("\n=== RESULT ===")
       for k, v in result.items():
           if k == "steps":
               print("steps:")
               for s in v:
                   print("  -", s)
           else:
               print(f"{k}: {v}")
       print("\n" + "="*80 + "\n")
    
    
    print("""
    To adapt this ToT agent beyond the 24 game:
    1) Define a STATE representation (like numbers/exprs here).
    2) Define a PROPOSER that generates candidate next steps (LLM tool or rule-based).
    3) Define a HEURISTIC / SCORER:
      - for checkable tasks, use objective scoring
      - for open-ended tasks, use an LLM-critic scoring rubric
    4) Run the same ToT loop:
      expand -> score -> prune -> keep top beam -> repeat until goal or depth limit.
    """)

    We implement the full Tree-of-Thoughts search loop using beam search and depth limits. We expand, score, prune, and select the top branches at each depth until we either reach a solution or exhaust the search budget. Finally, we reconstruct the reasoning path and demonstrate how the agent solves multiple 24-game instances step by step.

    In conclusion, we constructed a complete multi-branch reasoning agent that demonstrates how Tree-of-Thoughts transforms LLM reasoning from a single path into a structured search process. We implemented branch generation, heuristic scoring, pruning, beam selection, and depth control in a modular architecture that can easily be adapted to other reasoning problems. Through this tutorial, we saw how combining language models with search algorithms significantly improves structured decision-making. We now have a reusable ToT framework that we can extend to mathematical reasoning, planning tasks, symbolic search, or even LLM-critic-based evaluation systems.


    Check out the Full Codes here. Also, feel free to follow us on Twitter and don’t forget to join our 120k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.




    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleOgra warns of strict action against illegal hoarding of petroleum products
    Next Article India beat England to set up T20 World Cup final with NZ
    Naveed Ahmad
    • Website
    • Tumblr

    Related Posts

    AI & Tech

    EXCLUSIVE: Luma launches creative AI agents powered by its new ‘Unified Intelligence’ models

    March 5, 2026
    AI & Tech

    Netflix buys Ben Affleck’s AI filmmaking firm InterPositive

    March 5, 2026
    AI & Tech

    Zeno raises $25M to speed up production of its battery-swap motorbikes

    March 5, 2026
    Add A Comment
    Leave A Reply Cancel Reply

    Demo
    Top Posts

    How to Get a Bigger Penis – The Stem Cell Secret to Natural Penis Enlargement & A Quiz

    February 22, 20261 Views

    10 Totally different Methods to Safe Your Enterprise Premises

    February 19, 20261 Views

    Oatly loses ‘milk’ branding battle in UK Supreme Courtroom

    February 12, 20261 Views
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    Latest Reviews

    Subscribe to Updates

    Get the latest tech news from FooBar about tech, design and biz.

    Demo
    Most Popular

    How to Get a Bigger Penis – The Stem Cell Secret to Natural Penis Enlargement & A Quiz

    February 22, 20261 Views

    10 Totally different Methods to Safe Your Enterprise Premises

    February 19, 20261 Views

    Oatly loses ‘milk’ branding battle in UK Supreme Courtroom

    February 12, 20261 Views
    Our Picks

    Eric Trump Goes to Battle With Large Banks Over ‘Anti‑American’ Crypto Lobbying

    March 6, 2026

    10 Reasons Why Resident Evil Requiem is the Scariest Entry in the Series

    March 6, 2026

    Combined Military Hospital CMH Mailsi Garrison Jobs 2026 2026 Job Advertisement Pakistan

    March 6, 2026

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    Facebook X (Twitter) Instagram Pinterest
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms & Conditions
    • Advertise
    • Disclaimer
    © 2026 TheNews92.com. All Rights Reserved. Unauthorized reproduction or redistribution of content is strictly prohibited.

    Type above and press Enter to search. Press Esc to cancel.