Agentic AI refers to AI systems that can autonomously plan and execute multi-step tasks — not just respond to a single prompt. An AI agent uses tools (code execution, web search, file I/O, APIs), observes results, adapts its plan, and iterates until a goal is achieved. This is a qualitative leap beyond conversational AI.
What makes AI 'agentic'
| Property | Conversational AI | Agentic AI |
|---|---|---|
| Action model | Single response to single message | Sequence of actions toward a goal |
| Tool access | Text generation only | Code execution, search, APIs, file I/O, browser |
| Memory | Context window only (ephemeral) | Can write to files, databases, memory stores |
| Error handling | Errors surface to user | Observes error, diagnoses, retries autonomously |
| Task scope | One exchange = one task | One prompt = many sequential tasks over minutes/hours |
| Human involvement | Every step requires human input | Human sets goal; agent executes; human reviews result |
The agentic loop
The core pattern: (1) Observe current state. (2) Reason about what action to take next toward the goal. (3) Call a tool (code, search, API). (4) Observe result. (5) Update plan based on what was learned. (6) Repeat until goal achieved or stuck. Each iteration, the agent updates its understanding of the world — enabling course-correction, error recovery, and adaptive planning that single-turn LLMs cannot do.
Tool use in AI agents
Agents are defined by their tools. Tool use is implemented via function calling — the LLM outputs a structured JSON call, the runtime executes it, and the result is fed back as the next model input.
Anthropic function calling — tool use loop
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
const tools = [
{
name: "execute_code",
description: "Run Python code in a sandbox and return stdout/stderr",
input_schema: {
type: "object",
properties: {
code: { type: "string", description: "Python code to execute" }
},
required: ["code"]
}
},
{
name: "web_search",
description: "Search the web and return top results",
input_schema: {
type: "object",
properties: {
query: { type: "string" }
},
required: ["query"]
}
}
];
// Agentic loop
let messages = [{ role: "user", content: "Scrape the top 10 HN stories and plot upvotes" }];
while (true) {
const response = await client.messages.create({
model: "claude-opus-4-5", max_tokens: 4096, tools, messages
});
if (response.stop_reason === "end_turn") break; // done
// Process tool calls
for (const block of response.content) {
if (block.type === "tool_use") {
const result = await executeTool(block.name, block.input);
messages.push({ role: "assistant", content: response.content });
messages.push({ role: "user", content: [{ type: "tool_result", tool_use_id: block.id, content: result }] });
}
}
}| Tool category | Examples | Risk level |
|---|---|---|
| Code execution | Python sandbox, JavaScript runner, shell commands | High — can modify state; use sandboxing |
| Web search | Bing/Google search, URL fetching, scraping | Low — read-only |
| File I/O | Read/write files, create documents, parse PDFs | Medium — can overwrite files |
| API calls | REST APIs, database queries, external services | High — irreversible actions (email, payments) |
| Browser control | Playwright, Selenium — click, fill forms, navigate | High — can submit forms, make purchases |
| Memory | Vector DB writes, note-taking, state persistence | Low — but affects future context |
The agentic loop in detail
Concretely: a user gives the goal "Analyze this CSV and create a visualization." Here is what the agent does autonomously:
- Plan: "I should load the CSV, examine its structure and data types, then create an appropriate visualization."
- Act: Calls execute_code with Python to load the CSV and print df.head(), df.dtypes, df.describe().
- Observe: Sees columns [date, revenue, users, region], revenue and users are numeric, region is categorical.
- Re-plan: "A grouped bar chart of revenue by region over time would show the key trends."
- Act: Calls execute_code with matplotlib/seaborn code to generate the chart.
- Observe: Code raises a ValueError — date column is a string, not datetime.
- Self-correct: Adds pd.to_datetime(df["date"]) to the code, re-executes.
- Observe: Chart generated successfully at /tmp/chart.png.
- Complete: Returns the chart + written interpretation of the trends it shows.
ReAct: the standard agent prompting pattern
ReAct (Yao et al., 2022) formalizes the agentic loop as alternating Thought → Action → Observation cycles: (1) Thought: the agent reasons about current state. (2) Action: calls a tool with specific arguments. (3) Observation: receives the result. Prompting the model to always produce explicit Thought steps before acting dramatically improves reliability — the model reasons before committing to an action. Most production agent frameworks (LangChain, LlamaIndex, AutoGPT) implement ReAct-style loops.
Multi-agent systems
Complex tasks benefit from specialization — an orchestrator agent breaks the goal into sub-tasks and delegates to specialist agents. The key challenge: errors compound across agents.
| Agent role | Responsibility | Example |
|---|---|---|
| Orchestrator | Decomposes goal, assigns tasks, synthesizes results | "Build a full-stack app" → assigns frontend, backend, DB agents |
| Coder | Writes and iterates on code | GitHub Copilot, Claude Code, Devin |
| Researcher | Searches web, reads papers, synthesizes findings | Perplexity-style deep research |
| QA / Critic | Reviews outputs for errors, runs tests, suggests fixes | Code review agent, fact-checker |
| Memory manager | Maintains shared context — writes/reads from vector DB | Stores progress notes for other agents |
| Tool specialist | Calls external APIs (Stripe, Slack, Salesforce) | Zapier/Make.com-style automation |
Error compounding is the main risk
In a 5-agent pipeline, each agent operating at 90% accuracy produces a correct end-to-end result only 0.9⁵ = 59% of the time. This is why multi-agent systems need checkpointing (human review at key milestones), confidence scoring (agents flag uncertainty), and extensive testing. Current best practice: keep pipelines short (3–4 agents max), verify outputs between stages, and always include a QA/critic agent.
Risks and safety in agentic systems
| Risk | Description | Mitigation |
|---|---|---|
| Consequential actions | Deleting files, sending emails, making purchases — hard to undo | Sandboxing; confirmation prompts for irreversible actions; dry-run mode |
| Prompt injection | Malicious text in the environment (webpage, file, API response) hijacks agent goals | Input sanitization; don't execute instructions from retrieved content |
| Goal misinterpretation | "Delete all errors" → deletes error-handling code; agent achieves literal goal, not intent | Structured goal specification; intermediate confirmation; critic agents |
| Infinite loops | Agent repeats same failing action; no exit condition | Max iteration limits; detecting repeated actions; cost budgets |
| Cost blowouts | Many tool calls = many tokens; unbounded agent runs can cost $100s | Token budgets; iteration limits; cost alerts |
| Over-automation | Removing human oversight from critical decisions | Human-in-the-loop checkpoints; irreversible action gates |
LumiChats Agent Mode sandbox
LumiChats Agent Mode runs entirely inside a WebContainer — a sandboxed Node.js environment in the browser. There is no filesystem access beyond the container, no network access to external services, and no persistence after the session. All code execution is isolated to the browser tab. Actions are logged in real-time so you can monitor every step, and you can interrupt the agent at any point. This eliminates the irreversible action risk entirely for the LumiChats use case.
Practice questions
- What is the ReAct (Reasoning + Acting) pattern in agentic AI? (Answer: ReAct (Yao et al. 2022): the agent alternates between Reasoning steps (internal thought about what to do next) and Acting steps (calling a tool). Format: Thought: I need to find the current stock price... Action: search[AAPL stock price today] Observation: AAPL is at $189.32 Thought: Now I can answer... Final answer: AAPL is $189.32. The explicit reasoning steps make the agent's decisions interpretable and allow self-correction if a reasoning step is wrong.)
- What are the main failure modes of AI agents? (Answer: (1) Hallucination of tool results: agent 'recalls' information instead of calling the tool, fabricating results. (2) Infinite loops: agent calls the same tool repeatedly without progress. (3) Prompt injection from tool results: malicious content in search/document results hijacks agent behaviour. (4) Over-permissive tool use: agent takes irreversible actions (send email, delete file) without confirming. (5) Context overflow: long agentic runs accumulate context until the window fills, causing the agent to lose track of its goal.)
- What is the difference between a single-agent and multi-agent system? (Answer: Single-agent: one LLM with multiple tools, handles all subtasks sequentially. Simpler, easier to debug. Multi-agent: multiple specialised LLM agents, each with specific tools and expertise — an orchestrator coordinates them. Example: researcher agent (web search) + analyst agent (code execution) + writer agent (drafting). Advantages: specialisation (each agent optimised for its role), parallelism (agents work concurrently), scalability. Disadvantages: coordination overhead, error propagation between agents, debugging difficulty.)
- How does human-in-the-loop oversight work in production AI agent systems? (Answer: Checkpoints: the agent pauses at predefined high-risk action points (sending email, making API calls with financial impact, deleting data) and requires explicit user approval before proceeding. Confirmation prompts: 'I plan to send this email to 500 contacts — confirm?' Audit logging: every tool call, result, and decision is logged for review. Rollback: reversible actions use transactions or staging environments before committing. Interrupt handlers: users can stop the agent mid-run. Minimal footprint principle: agents request only necessary permissions and prefer reversible actions.)
- What is the difference between agentic AI and traditional RPA (Robotic Process Automation)? (Answer: RPA: rule-based automation of structured workflows — scrapes data from fixed UI positions, follows rigid decision trees. Brittle: breaks when UI changes. Cannot handle ambiguity or novel situations. Agentic AI: understands intent from natural language, adapts to UI changes, handles ambiguous situations through reasoning, and can decompose novel tasks it has never seen before. RPA is scripted; agents are reasoning. Hybrid approaches (AI + RPA) use agents for decision-making and RPA for reliable execution of structured steps.)
On LumiChats
LumiChats Agent Mode gives the AI a full sandboxed Node.js environment (WebContainer) running in your browser. The agent writes code, executes it, reads output, fixes errors, and generates downloadable files — all without needing a server.
Try it free