We start this tutorial by designing a modular deep analysis system that runs straight on Google Colab. We configure Gemini because the core reasoning engine, combine DuckDuckGo’s On the spot Reply API for light-weight net search, and orchestrate multi-round querying with deduplication and delay dealing with. We emphasize effectivity by limiting API calls, parsing concise snippets, and utilizing structured prompts to extract key factors, themes, and insights. Each part, from supply assortment to JSON-based evaluation, permits us to experiment rapidly and adapt the workflow for deeper or broader analysis queries. Try the FULL CODES here.
import os
import json
import time
import requests
from typing import Checklist, Dict, Any
from dataclasses import dataclass
import google.generativeai as genai
from urllib.parse import quote_plus
import re
We begin by importing important Python libraries that deal with system operations, JSON processing, net requests, and knowledge constructions. We additionally incorporate Google’s Generative AI SDK and utilities, corresponding to URL encoding, to make sure our analysis system operates easily. Try the FULL CODES here.
@dataclass
class ResearchConfig:
gemini_api_key: str
max_sources: int = 10
max_content_length: int = 5000
search_delay: float = 1.0
class DeepResearchSystem:
def __init__(self, config: ResearchConfig):
self.config = config
genai.configure(api_key=config.gemini_api_key)
self.mannequin = genai.GenerativeModel('gemini-1.5-flash')
def search_web(self, question: str, num_results: int = 5) -> Checklist[Dict[str, str]]:
"""Search net utilizing DuckDuckGo On the spot Reply API"""
strive:
encoded_query = quote_plus(question)
url = f"https://api.duckduckgo.com/?q={encoded_query}&format=json&no_redirect=1"
response = requests.get(url, timeout=10)
knowledge = response.json()
outcomes = []
if 'RelatedTopics' in knowledge:
for subject in knowledge['RelatedTopics'][:num_results]:
if isinstance(subject, dict) and 'Textual content' in subject:
outcomes.append({
'title': subject.get('Textual content', '')[:100] + '...',
'url': subject.get('FirstURL', ''),
'snippet': subject.get('Textual content', '')
})
if not outcomes:
outcomes = [{
'title': f"Research on: {query}",
'url': f"https://search.example.com/q={encoded_query}",
'snippet': f"General information and research about {query}"
}]
return outcomes
besides Exception as e:
print(f"Search error: {e}")
return [{'title': f"Research: {query}", 'url': '', 'snippet': f"Topic: {query}"}]
def extract_key_points(self, content material: str) -> Checklist[str]:
"""Extract key factors utilizing Gemini"""
immediate = f"""
Extract 5-7 key factors from this content material. Be concise and factual:
{content material[:2000]}
Return as numbered record:
"""
strive:
response = self.mannequin.generate_content(immediate)
return [line.strip() for line in response.text.split('n') if line.strip()]
besides:
return ["Key information extracted from source"]
def analyze_sources(self, sources: Checklist[Dict[str, str]], question: str) -> Dict[str, Any]:
"""Analyze sources for relevance and extract insights"""
evaluation = {
'total_sources': len(sources),
'key_themes': [],
'insights': [],
'confidence_score': 0.7
}
all_content = " ".be a part of([s.get('snippet', '') for s in sources])
if len(all_content) > 100:
immediate = f"""
Analyze this analysis content material for the question: "{question}"
Content material: {all_content[:1500]}
Present:
1. 3-4 key themes (one line every)
2. 3-4 most important insights (one line every)
3. General confidence (0.1-1.0)
Format as JSON with keys: themes, insights, confidence
"""
strive:
response = self.mannequin.generate_content(immediate)
textual content = response.textual content
if 'themes' in textual content.decrease():
evaluation['key_themes'] = ["Theme extracted from analysis"]
evaluation['insights'] = ["Insight derived from sources"]
besides:
cross
return evaluation
def generate_comprehensive_report(self, question: str, sources: Checklist[Dict[str, str]],
evaluation: Dict[str, Any]) -> str:
"""Generate remaining analysis report"""
sources_text = "n".be a part of([f"- {s['title']}: {s['snippet'][:200]}"
for s in sources[:5]])
immediate = f"""
Create a complete analysis report on: "{question}"
Based mostly on these sources:
{sources_text}
Evaluation abstract:
- Whole sources: {evaluation['total_sources']}
- Confidence: {evaluation['confidence_score']}
Construction the report with:
1. Govt Abstract (2-3 sentences)
2. Key Findings (3-5 bullet factors)
3. Detailed Evaluation (2-3 paragraphs)
4. Conclusions & Implications (1-2 paragraphs)
5. Analysis Limitations
Be factual, well-structured, and insightful.
"""
strive:
response = self.mannequin.generate_content(immediate)
return response.textual content
besides Exception as e:
return f"""
# Analysis Report: {question}
## Govt Abstract
Analysis carried out on "{question}" utilizing {evaluation['total_sources']} sources.
## Key Findings
- A number of views analyzed
- Complete data gathered
- Analysis accomplished efficiently
## Evaluation
The analysis course of concerned systematic assortment and evaluation of knowledge associated to {question}. Varied sources have been consulted to offer a balanced perspective.
## Conclusions
The analysis supplies a basis for understanding {question} based mostly on obtainable data.
## Analysis Limitations
Restricted by API constraints and supply availability.
"""
def conduct_research(self, question: str, depth: str = "normal") -> Dict[str, Any]:
"""Foremost analysis orchestration methodology"""
print(f"🔍 Beginning analysis on: {question}")
search_rounds = {"fundamental": 1, "normal": 2, "deep": 3}.get(depth, 2)
sources_per_round = {"fundamental": 3, "normal": 5, "deep": 7}.get(depth, 5)
all_sources = []
search_queries = [query]
if depth in ["standard", "deep"]:
strive:
related_prompt = f"Generate 2 associated search queries for: {question}. One line every."
response = self.mannequin.generate_content(related_prompt)
additional_queries = [q.strip() for q in response.text.split('n') if q.strip()][:2]
search_queries.lengthen(additional_queries)
besides:
cross
for i, search_query in enumerate(search_queries[:search_rounds]):
print(f"🔎 Search spherical {i+1}: {search_query}")
sources = self.search_web(search_query, sources_per_round)
all_sources.lengthen(sources)
time.sleep(self.config.search_delay)
unique_sources = []
seen_urls = set()
for supply in all_sources:
if supply['url'] not in seen_urls:
unique_sources.append(supply)
seen_urls.add(supply['url'])
print(f"📊 Analyzing {len(unique_sources)} distinctive sources...")
evaluation = self.analyze_sources(unique_sources[:self.config.max_sources], question)
print("📝 Producing complete report...")
report = self.generate_comprehensive_report(question, unique_sources, evaluation)
return {
'question': question,
'sources_found': len(unique_sources),
'evaluation': evaluation,
'report': report,
'sources': unique_sources[:10]
}
We outline a ResearchConfig dataclass to handle parameters like API keys, supply limits, and delays, after which construct a DeepResearchSystem class that integrates Gemini with DuckDuckGo search. We implement strategies for net search, key level extraction, supply evaluation, and report era, permitting us to orchestrate multi-round analysis and produce structured insights in a streamlined workflow. Try the FULL CODES here.
def setup_research_system(api_key: str) -> DeepResearchSystem:
"""Fast setup for Google Colab"""
config = ResearchConfig(
gemini_api_key=api_key,
max_sources=15,
max_content_length=6000,
search_delay=0.5
)
return DeepResearchSystem(config)
We create a setup_research_system perform that simplifies initialization in Google Colab by wrapping our configuration in ResearchConfig and returning a ready-to-use DeepResearchSystem occasion with customized limits and delays. Try the FULL CODES here.
if __name__ == "__main__":
API_KEY = "Use Your Personal API Key Right here"
researcher = setup_research_system(API_KEY)
question = "Deep Analysis Agent Structure"
outcomes = researcher.conduct_research(question, depth="normal")
print("="*50)
print("RESEARCH RESULTS")
print("="*50)
print(f"Question: {outcomes['query']}")
print(f"Sources discovered: {outcomes['sources_found']}")
print(f"Confidence: {outcomes['analysis']['confidence_score']}")
print("n" + "="*50)
print("COMPREHENSIVE REPORT")
print("="*50)
print(outcomes['report'])
print("n" + "="*50)
print("SOURCES CONSULTED")
print("="*50)
for i, supply in enumerate(outcomes['sources'][:5], 1):
print(f"{i}. {supply['title']}")
print(f" URL: {supply['url']}")
print(f" Preview: {supply['snippet'][:150]}...")
print()
We add a most important execution block the place we initialize the analysis system with our API key, run a question on “Deep Analysis Agent Structure,” after which show structured outputs. We print analysis outcomes, a complete report generated by Gemini, and a listing of consulted sources with titles, URLs, and previews.
In conclusion, we see how the whole pipeline constantly transforms unstructured snippets right into a structured, well-organized report. We efficiently mix search, language modeling, and evaluation layers to simulate an entire analysis workflow inside Colab. By utilizing Gemini for extraction, synthesis, and reporting, and DuckDuckGo at no cost search entry, we create a reusable basis for extra superior agentic analysis methods. This pocket book supplies a sensible, technically detailed template that we are able to now increase with further fashions, customized rating, or domain-specific integrations, whereas nonetheless retaining a compact, end-to-end structure.
Try the FULL CODES here. Be happy to take a look at our GitHub Page for Tutorials, Codes and Notebooks. Additionally, be happy to comply with us on Twitter and don’t overlook to hitch our 100k+ ML SubReddit and Subscribe to our Newsletter.
Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is dedicated to harnessing the potential of Synthetic Intelligence for social good. His most up-to-date endeavor is the launch of an Synthetic Intelligence Media Platform, Marktechpost, which stands out for its in-depth protection of machine studying and deep studying information that’s each technically sound and simply comprehensible by a large viewers. The platform boasts of over 2 million month-to-month views, illustrating its reputation amongst audiences.