Construct a Self-Evaluating Agentic AI System with LlamaIndex and OpenAI

On this tutorial, we construct a complicated agentic AI workflow utilizing LlamaIndex and OpenAI fashions. We concentrate on designing a dependable retrieval-augmented era (RAG) agent that may purpose over proof, use instruments intentionally, and consider its personal outputs for high quality. By structuring the system round retrieval, reply synthesis, and self-evaluation, we exhibit how agentic patterns transcend easy chatbots and transfer towards extra reliable, controllable AI methods appropriate for analysis and analytical use instances.

!pip -q set up -U llama-index llama-index-llms-openai llama-index-embeddings-openai nest_asyncio


import os
import asyncio
import nest_asyncio
nest_asyncio.apply()


from getpass import getpass


if not os.environ.get("OPENAI_API_KEY"):
   os.environ["OPENAI_API_KEY"] = getpass("Enter OPENAI_API_KEY: ")

We arrange the surroundings and set up all required dependencies for working an agentic AI workflow. We securely load the OpenAI API key at runtime, guaranteeing that credentials are by no means hardcoded. We additionally put together the pocket book to deal with asynchronous execution easily.

from llama_index.core import Doc, VectorStoreIndex, Settings
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding


Settings.llm = OpenAI(mannequin="gpt-4o-mini", temperature=0.2)
Settings.embed_model = OpenAIEmbedding(mannequin="text-embedding-3-small")


texts = [
   "Reliable RAG systems separate retrieval, synthesis, and verification. Common failures include hallucination and shallow retrieval.",
   "RAG evaluation focuses on faithfulness, answer relevancy, and retrieval quality.",
   "Tool-using agents require constrained tools, validation, and self-review loops.",
   "A robust workflow follows retrieve, answer, evaluate, and revise steps."
]


docs = [Document(text=t) for t in texts]
index = VectorStoreIndex.from_documents(docs)
query_engine = index.as_query_engine(similarity_top_k=4)

We configure the OpenAI language mannequin and embedding mannequin and construct a compact information base for our agent. We remodel uncooked textual content into listed paperwork in order that the agent can retrieve related proof throughout reasoning.

from llama_index.core.analysis import FaithfulnessEvaluator, RelevancyEvaluator


faith_eval = FaithfulnessEvaluator(llm=Settings.llm)
rel_eval = RelevancyEvaluator(llm=Settings.llm)


def retrieve_evidence(q: str) -> str:
   r = query_engine.question(q)
   out = []
   for i, n in enumerate(r.source_nodes or []):
       out.append(f"[{i+1}] {n.node.get_content()[:300]}")
   return "n".be part of(out)


def score_answer(q: str, a: str) -> str:
   r = query_engine.question(q)
   ctx = [n.node.get_content() for n in r.source_nodes or []]
   f = faith_eval.consider(question=q, response=a, contexts=ctx)
   r = rel_eval.consider(question=q, response=a, contexts=ctx)
   return f"Faithfulness: {f.rating}nRelevancy: {r.rating}"

We outline the core instruments utilized by the agent: proof retrieval and reply analysis. We implement automated scoring for faithfulness and relevancy so the agent can choose the standard of its personal responses.

from llama_index.core.agent.workflow import ReActAgent
from llama_index.core.workflow import Context


agent = ReActAgent(
   instruments=[retrieve_evidence, score_answer],
   llm=Settings.llm,
   system_prompt="""
All the time retrieve proof first.
Produce a structured reply.
Consider the reply and revise as soon as if scores are low.
""",
   verbose=True
)


ctx = Context(agent)

We create the ReAct-based agent and outline its system habits, guiding the way it retrieves proof, generates solutions, and revises outcomes. We additionally initialize the execution context that maintains the agent’s state throughout interactions. It step brings collectively instruments and reasoning right into a single agentic workflow.

async def run_brief(matter: str):
   q = f"Design a dependable RAG + tool-using agent workflow and easy methods to consider it. Subject: {matter}"
   handler = agent.run(q, ctx=ctx)
   async for ev in handler.stream_events():
       print(getattr(ev, "delta", ""), finish="")
   res = await handler
   return str(res)


matter = "RAG agent reliability and analysis"
loop = asyncio.get_event_loop()
end result = loop.run_until_complete(run_brief(matter))


print("nnFINAL OUTPUTn")
print(end result)

We execute the complete agent loop by passing a subject into the system and streaming the agent’s reasoning and output. We permit the agent to finish its retrieval, era, and analysis cycle asynchronously.

In conclusion, we showcased how an agent can retrieve supporting proof, generate a structured response, and assess its personal faithfulness and relevancy earlier than finalizing a solution. We saved the design modular and clear, making it straightforward to increase the workflow with further instruments, evaluators, or domain-specific information sources. This strategy illustrates how we will use agentic AI with LlamaIndex and OpenAI fashions to construct extra succesful methods which can be additionally extra dependable and self-aware of their reasoning and responses.

Take a look at the FULL CODES here. Additionally, be at liberty to observe us on Twitter and don’t overlook to affix our 100k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.

Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is dedicated to harnessing the potential of Synthetic Intelligence for social good. His most up-to-date endeavor is the launch of an Synthetic Intelligence Media Platform, Marktechpost, which stands out for its in-depth protection of machine studying and deep studying information that’s each technically sound and simply comprehensible by a large viewers. The platform boasts of over 2 million month-to-month views, illustrating its recognition amongst audiences.

Source link

What's Hot

10 Finest Sniper Rifles in Escape From Tarkov

Bayern demolish hosts Leipzig in gorgeous 5-1 comeback win

PTV stun SNGPL to interrupt centuries-old report

Who will get to inherit the celebrities? An area ethicist on what we’re not speaking about

Italy investigates Activision Blizzard for pushing in-game purchases

YouTube relaxes monetization tips for some controversial subjects

Hytale Enters Early Entry After A Decade After Surviving Cancellation

Textile exports dip throughout EU, US & UK

Planning & Growth Division Quetta Jobs 2026 2025 Job Commercial Pakistan

Most Popular

Hytale Enters Early Entry After A Decade After Surviving Cancellation

Textile exports dip throughout EU, US & UK

Planning & Growth Division Quetta Jobs 2026 2025 Job Commercial Pakistan

Our Picks

10 Finest Sniper Rifles in Escape From Tarkov

Bayern demolish hosts Leipzig in gorgeous 5-1 comeback win

PTV stun SNGPL to interrupt centuries-old report

Subscribe to Updates

What's Hot

Construct a Self-Evaluating Agentic AI System with LlamaIndex and OpenAI Utilizing Retrieval, Device Use, and Automated High quality Checks

Related Posts

Subscribe to Updates