Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Ukrainian man jailed for identity theft that helped North Koreans get jobs at US companies

    February 21, 2026

    Understanding the Function of an Ovulation Tracker in Fashionable Fertility Care

    February 21, 2026

    UK monarchy reels from Andrew’s beautiful arrest

    February 21, 2026
    Facebook X (Twitter) Instagram
    Saturday, February 21
    Trending
    • Ukrainian man jailed for identity theft that helped North Koreans get jobs at US companies
    • Understanding the Function of an Ovulation Tracker in Fashionable Fertility Care
    • UK monarchy reels from Andrew’s beautiful arrest
    • Strategy CEO Blasts 1,250% Risk Weight
    • How to Remove the Wall of Flesh & Throbbing Arteries in Mewgenics
    • 5 Basement Family Room Ideas to Make a Dark Space Feel Bright and Inviting | Wit & Delight
    • Highways Division Naushahro Feroze Jobs 2026 2026 Job Commercial Pakistan
    • Board of Metro Vancouver’s CAO tight-lipped about status, cost of ‘leak’ investigation – BC
    • Top 10 Benefits of Apple Cider Vinegar
    • Energy discount fails to impress industry – Business
    Facebook X (Twitter) Instagram Pinterest Vimeo
    The News92The News92
    • Home
    • World
    • National
    • Sports
    • Crypto
    • Travel
    • Lifestyle
    • Jobs
    • Insurance
    • Gaming
    • AI & Tech
    • Health & Fitness
    The News92The News92
    Home - AI & Tech - How to Design a Swiss Army Knife Research Agent with Tool-Using AI, Web Search, PDF Analysis, Vision, and Automated Reporting
    AI & Tech

    How to Design a Swiss Army Knife Research Agent with Tool-Using AI, Web Search, PDF Analysis, Vision, and Automated Reporting

    Naveed AhmadBy Naveed AhmadFebruary 21, 2026No Comments7 Mins Read
    Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    In this tutorial, we build a “Swiss Army Knife” research agent that goes far beyond simple chat interactions and actively solves multi-step research problems end-to-end. We combine a tool-using agent architecture with live web search, local PDF ingestion, vision-based chart analysis, and automated report generation to demonstrate how modern agents can reason, verify, and produce structured outputs. By wiring together small agents, OpenAI models, and practical data-extraction utilities, we show how a single agent can explore sources, cross-check claims, and synthesize findings into professional-grade Markdown and DOCX reports.

    %pip -q install -U smolagents openai trafilatura duckduckgo-search pypdf pymupdf python-docx pillow tqdm
    
    
    import os, re, json, getpass
    from typing import List, Dict, Any
    import requests
    import trafilatura
    from duckduckgo_search import DDGS
    from pypdf import PdfReader
    import fitz
    from docx import Document
    from docx.shared import Pt
    from datetime import datetime
    
    
    from openai import OpenAI
    from smolagents import CodeAgent, OpenAIModel, tool
    
    
    if not os.environ.get("OPENAI_API_KEY"):
       os.environ["OPENAI_API_KEY"] = getpass.getpass("Paste your OpenAI API key (hidden): ").strip()
    print("OPENAI_API_KEY set:", "YES" if os.environ.get("OPENAI_API_KEY") else "NO")
    
    
    if not os.environ.get("SERPER_API_KEY"):
       serper = getpass.getpass("Optional: Paste SERPER_API_KEY for Google results (press Enter to skip): ").strip()
       if serper:
           os.environ["SERPER_API_KEY"] = serper
    print("SERPER_API_KEY set:", "YES" if os.environ.get("SERPER_API_KEY") else "NO")
    
    
    client = OpenAI()
    
    
    def _now():
       return datetime.utcnow().strftime("%Y-%m-%d %H:%M:%SZ")
    
    
    def _safe_filename(s: str) -> str:
       s = re.sub(r"[^a-zA-Z0-9._-]+", "_", s).strip("_")
       return s[:180] if s else "file"

    We set up the full execution environment and securely load all required credentials without hardcoding secrets. We import all dependencies required for web search, document parsing, vision analysis, and agent orchestration. We also initialize shared utilities to standardize timestamps and file naming throughout the workflow.

    try:
       from google.colab import files
       os.makedirs("/content/pdfs", exist_ok=True)
       uploaded = files.upload()
       for name, data in uploaded.items():
           if name.lower().endswith(".pdf"):
               with open(f"/content/pdfs/{name}", "wb") as f:
                   f.write(data)
       print("PDFs in /content/pdfs:", os.listdir("/content/pdfs"))
    except Exception as e:
       print("Upload skipped:", str(e))
    
    
    def web_search(query: str, k: int = 6) -> List[Dict[str, str]]:
       serper_key = os.environ.get("SERPER_API_KEY", "").strip()
       if serper_key:
           resp = requests.post(
               "https://google.serper.dev/search",
               headers={"X-API-KEY": serper_key, "Content-Type": "application/json"},
               json={"q": query, "num": k},
               timeout=30,
           )
           resp.raise_for_status()
           data = resp.json()
           out = []
           for item in (data.get("organic") or [])[:k]:
               out.append({
                   "title": item.get("title",""),
                   "url": item.get("link",""),
                   "snippet": item.get("snippet",""),
               })
           return out
    
    
       out = []
       with DDGS() as ddgs:
           for r in ddgs.text(query, max_results=k):
               out.append({
                   "title": r.get("title",""),
                   "url": r.get("href",""),
                   "snippet": r.get("body",""),
               })
       return out
    
    
    def fetch_url_text(url: str) -> Dict[str, Any]:
       try:
           downloaded = trafilatura.fetch_url(url, timeout=30)
           if not downloaded:
               return {"url": url, "ok": False, "error": "fetch_failed", "text": ""}
           text = trafilatura.extract(downloaded, include_comments=False, include_tables=True)
           if not text:
               return {"url": url, "ok": False, "error": "extract_failed", "text": ""}
           title_guess = next((ln.strip() for ln in text.splitlines() if ln.strip()), "")[:120]
           return {"url": url, "ok": True, "title_guess": title_guess, "text": text}
       except Exception as e:
           return {"url": url, "ok": False, "error": str(e), "text": ""}

    We enable local PDF ingestion and establish a flexible web search pipeline that works with or without a paid search API. We show how we gracefully handle optional inputs while maintaining a reliable research flow. We also implement robust URL fetching and text extraction to prepare clean source material for downstream reasoning.

    def read_pdf_text(pdf_path: str, max_pages: int = 30) -> Dict[str, Any]:
       reader = PdfReader(pdf_path)
       pages = min(len(reader.pages), max_pages)
       chunks = []
       for i in range(pages):
           try:
               chunks.append(reader.pages[i].extract_text() or "")
           except Exception:
               chunks.append("")
       return {"pdf_path": pdf_path, "pages_read": pages, "text": "\n\n".join(chunks).strip()}
    
    
    def extract_pdf_images(pdf_path: str, out_dir: str = "/content/extracted_images", max_pages: int = 10) -> List[str]:
       os.makedirs(out_dir, exist_ok=True)
       doc = fitz.open(pdf_path)
       saved = []
       pages = min(len(doc), max_pages)
       base = _safe_filename(os.path.basename(pdf_path).rsplit(".", 1)[0])
    
    
       for p in range(pages):
           page = doc[p]
           img_list = page.get_images(full=True)
           for img_i, img in enumerate(img_list):
               xref = img[0]
               pix = fitz.Pixmap(doc, xref)
               if pix.n - pix.alpha >= 4:
                   pix = fitz.Pixmap(fitz.csRGB, pix)
               img_path = os.path.join(out_dir, f"{base}_p{p+1}_img{img_i+1}.png")
               pix.save(img_path)
               saved.append(img_path)
    
    
       doc.close()
       return saved
    
    
    def vision_analyze_image(image_path: str, question: str, model: str = "gpt-4.1-mini") -> Dict[str, Any]:
       with open(image_path, "rb") as f:
           img_bytes = f.read()
    
    
       resp = client.responses.create(
           model=model,
           input=[{
               "role": "user",
               "content": [
                   {"type": "input_text", "text": f"Answer concisely and accurately.\n\nQuestion: {question}"},
                   {"type": "input_image", "image_data": img_bytes},
               ],
           }],
       )
       return {"image_path": image_path, "answer": resp.output_text}

    We focus on deep document understanding by extracting structured text and visual artifacts from PDFs. We integrate a vision-capable model to interpret charts and figures instead of treating them as opaque images. We ensure that numerical trends and visual insights can be converted into explicit, text-based evidence.

    def write_markdown(path: str, content: str) -> str:
       os.makedirs(os.path.dirname(path), exist_ok=True)
       with open(path, "w", encoding="utf-8") as f:
           f.write(content)
       return path
    
    
    def write_docx_from_markdown(docx_path: str, md: str, title: str = "Research Report") -> str:
       os.makedirs(os.path.dirname(docx_path), exist_ok=True)
       doc = Document()
       t = doc.add_paragraph()
       run = t.add_run(title)
       run.bold = True
       run.font.size = Pt(18)
       meta = doc.add_paragraph()
       meta.add_run(f"Generated: {_now()}").italic = True
       doc.add_paragraph("")
       for line in md.splitlines():
           line = line.rstrip()
           if not line:
               doc.add_paragraph("")
               continue
           if line.startswith("# "):
               doc.add_heading(line[2:].strip(), level=1)
           elif line.startswith("## "):
               doc.add_heading(line[3:].strip(), level=2)
           elif line.startswith("### "):
               doc.add_heading(line[4:].strip(), level=3)
           elif re.match(r"^\s*[-*]\s+", line):
               p = doc.add_paragraph(style="List Bullet")
               p.add_run(re.sub(r"^\s*[-*]\s+", "", line).strip())
           else:
               doc.add_paragraph(line)
       doc.save(docx_path)
       return docx_path
    
    
    @tool
    def t_web_search(query: str, k: int = 6) -> str:
       return json.dumps(web_search(query, k), ensure_ascii=False)
    
    
    @tool
    def t_fetch_url_text(url: str) -> str:
       return json.dumps(fetch_url_text(url), ensure_ascii=False)
    
    
    @tool
    def t_list_pdfs() -> str:
       pdf_dir = "/content/pdfs"
       if not os.path.isdir(pdf_dir):
           return json.dumps([])
       paths = [os.path.join(pdf_dir, f) for f in os.listdir(pdf_dir) if f.lower().endswith(".pdf")]
       return json.dumps(sorted(paths), ensure_ascii=False)
    
    
    @tool
    def t_read_pdf_text(pdf_path: str, max_pages: int = 30) -> str:
       return json.dumps(read_pdf_text(pdf_path, max_pages=max_pages), ensure_ascii=False)
    
    
    @tool
    def t_extract_pdf_images(pdf_path: str, max_pages: int = 10) -> str:
       imgs = extract_pdf_images(pdf_path, max_pages=max_pages)
       return json.dumps(imgs, ensure_ascii=False)
    
    
    @tool
    def t_vision_analyze_image(image_path: str, question: str) -> str:
       return json.dumps(vision_analyze_image(image_path, question), ensure_ascii=False)
    
    
    @tool
    def t_write_markdown(path: str, content: str) -> str:
       return write_markdown(path, content)
    
    
    @tool
    def t_write_docx_from_markdown(docx_path: str, md_path: str, title: str = "Research Report") -> str:
       with open(md_path, "r", encoding="utf-8") as f:
           md = f.read()
       return write_docx_from_markdown(docx_path, md, title=title)

    We implement the full output layer by generating Markdown reports and converting them into polished DOCX documents. We expose all core capabilities as explicit tools that the agent can reason about and invoke step by step. We ensure that every transformation from raw data to final report remains deterministic and inspectable.

    model = OpenAIModel(model_id="gpt-5")
    
    
    agent = CodeAgent(
       tools=[
           t_web_search,
           t_fetch_url_text,
           t_list_pdfs,
           t_read_pdf_text,
           t_extract_pdf_images,
           t_vision_analyze_image,
           t_write_markdown,
           t_write_docx_from_markdown,
       ],
       model=model,
       add_base_tools=False,
       additional_authorized_imports=["json","re","os","math","datetime","time","textwrap"],
    )
    
    
    SYSTEM_INSTRUCTIONS = """
    You are a Swiss Army Knife Research Agent.
    """
    
    
    def run_research(topic: str):
       os.makedirs("/content/report", exist_ok=True)
       prompt = f"""{SYSTEM_INSTRUCTIONS.strip()}
    
    
    Research question:
    {topic}
    
    
    Steps:
    1) List available PDFs (if any) and decide which are relevant.
    2) Do web search for the topic.
    3) Fetch and extract the text of the best sources.
    4) If PDFs exist, extract text and images.
    5) Visually analyze figures.
    6) Write a Markdown report and convert to DOCX.
    """
       return agent.run(prompt)
    
    
    topic = "Build a research brief on the most reliable design patterns for tool-using agents (2024-2026), focusing on evaluation, citations, and failure modes."
    out = run_research(topic)
    print(out[:1500] if isinstance(out, str) else out)
    
    
    try:
       from google.colab import files
       files.download("/content/report/report.md")
       files.download("/content/report/report.docx")
    except Exception as e:
       print("Download skipped:", str(e))

    We assemble the complete research agent and define a structured execution plan for multi-step reasoning. We guide the agent to search, analyze, synthesize, and write using a single coherent prompt. We demonstrate how the agent produces a finished research artifact that can be reviewed, shared, and reused immediately.

    In conclusion, we demonstrated how a well-designed tool-using agent can function as a reliable research assistant rather than a conversational toy. We showcased how explicit tools, disciplined prompting, and step-by-step execution allow the agent to search the web, analyze documents and visuals, and generate traceable, citation-aware reports. This approach offers a practical blueprint for building trustworthy research agents that emphasize evaluation, evidence, and failure awareness, capabilities increasingly essential for real-world AI systems.


    Check out the Full Codes here. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.




    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Previous ArticleMinerals alone don’t guarantee prosperity
    Next Article ‘India won’t qualify for T20 World Cup semis’
    Naveed Ahmad
    • Website
    • Tumblr

    Related Posts

    AI & Tech

    Ukrainian man jailed for identity theft that helped North Koreans get jobs at US companies

    February 21, 2026
    AI & Tech

    Nice information for xAI: Grok is now fairly good at answering questions on Baldur’s Gate

    February 21, 2026
    AI & Tech

    InScope nabs $14.5M to resolve the ache of monetary reporting

    February 21, 2026
    Add A Comment
    Leave A Reply Cancel Reply

    Demo
    Top Posts

    Oatly loses ‘milk’ branding battle in UK Supreme Courtroom

    February 12, 20261 Views

    ‘Fly excessive my angel’: 12-year-old lady dies by suicide amid bullying allegations

    February 7, 20261 Views

    Ukrainian man jailed for identity theft that helped North Koreans get jobs at US companies

    February 21, 20260 Views
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    Latest Reviews

    Subscribe to Updates

    Get the latest tech news from FooBar about tech, design and biz.

    Demo
    Most Popular

    Oatly loses ‘milk’ branding battle in UK Supreme Courtroom

    February 12, 20261 Views

    ‘Fly excessive my angel’: 12-year-old lady dies by suicide amid bullying allegations

    February 7, 20261 Views

    Ukrainian man jailed for identity theft that helped North Koreans get jobs at US companies

    February 21, 20260 Views
    Our Picks

    Ukrainian man jailed for identity theft that helped North Koreans get jobs at US companies

    February 21, 2026

    Understanding the Function of an Ovulation Tracker in Fashionable Fertility Care

    February 21, 2026

    UK monarchy reels from Andrew’s beautiful arrest

    February 21, 2026

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    Facebook X (Twitter) Instagram Pinterest
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms & Conditions
    • Advertise
    • Disclaimer
    © 2026 TheNews92.com. All Rights Reserved. Unauthorized reproduction or redistribution of content is strictly prohibited.

    Type above and press Enter to search. Press Esc to cancel.