Stop building dumb Chatbots.
A Chatbot waits for a user to type. An Agent runs in a loop, observing the world, planning actions, and using tools to achieve a goal.
In 2026, software doesn't just display data. It performs labor.
02. The ReAct Loop (Reason + Act)
Most agents follow the ReAct pattern:
1. Thought
The LLM analyzes the request and its current context. "I need to check the weather."
2. Action
The LLM selects a tool and generates JSON arguments. `weatherTool({ city: "Tokyo" })`
3. Observation
The system executes the tool and feeds the result back to the LLM. "It is raining."
// 1. Ask LLM what to do next
const nextStep = await llm.predict(history);
if (nextStep.isFinalAnswer) return nextStep.answer;
// 2. Execute Tool (e.g. valid Zod schema)
const observation = await tools[nextStep.tool](nextStep.args);
// 3. Update Memory
history.push({'{'} role: 'tool', content: observation {'}'});
{'}'}
03. Defining Tools with Zod
Tools are the hands of the AI. But LLMs are notoriously bad at outputting perfect JSON. They miss commas, hallucinate fields, or use the wrong types.
We use Zod to strictly type the inputs. This acts as a "Runtime Guard" — if the LLM generates invalid arguments, validation fails before the tool executes, and we can feed the error back to the LLM to self-correct.
import { z } from "zod";
const tools = {
// ✈️ Tool Definition
bookFlight: tool({
description: "Book a flight for the user",
parameters: z.object({
destination: z.string().length(3).describe("3-letter IATA code"),
date: z.string().date(),
maxPrice: z.number().max(5000),
seatPreference: z.enum(["window", "aisle"])
}),
execute: async ({ destination, date }) => {
// Real API call here
return await api.flights.book(destination, date);
}
})
};
Deep Dive: Structured Output vs JSON Mode
JSON Mode just guarantees braces {} match.
Structured Outputs (OpenAI's latest feature) guarantees the schema matches exactly. It actually constrains the sampling tokens to the schema. Always use Structured Outputs/Tool Calling for reliability.
04. Memory Systems
Short-Term Memory (Context Window)
Everything in the current chat array. Limited by token cost and window size (e.g. 128k tokens).
Long-Term Memory (RAG / Vector DB)
Stores millions of documents (PDFs, Logs, Past Chats). The Agent queries this database semantically to recall facts from months ago.
05. Streaming Generative UI
Text is boring. Modern Agents stream React Components. Based on the tool result, the agent can decide to render a chart, a map, or a flight ticket widget, streamed instantly to the client using RSC.
06. Safety & Limits (The Kill Switch)
⚠️ Danger: Infinite Loops
An agent that gets stuck trying to "fix" an error can burn $100 in 5 minutes.
Always implement:
- Max Steps (e.g. stop after 10 loops)
- Human Confirmation for Side Effects (POST/DELETE/PUT)
- Timeouts
07. Senior Engineer's Take
- Determinism is gone. You cannot write unit tests for "Reasoning". You must use "Evals" (Evaluation datasets) to score your agent's success rate.
- Latency is high. Agents take time to think. Good UX requires "Skeletal Loading" or "Thought Streaming" to keep the user engaged while the agent works.
- The Context Window Bottleneck. You cannot shove 100 files into the prompt. You must implement a "Summarization Step" or "Knowledge Graph" to compress information as the conversation grows.
08. Agent Simulator
Watch an agent reason through a multi-step problem, utilizing tools and memory to find the best flight deal.