The Question
Design

Agentic AI System

Design an agentic AI system capable of autonomously executing multi-step tasks. The system should support dynamic planning, tool use for external actions such as web search and code execution, stateful context management across steps, and safety guardrails to ensure reliable and controlled operation.
PostgreSQL
Redis
SQS
WebSocket
SSE
Questions & Insights

Thinking Process

The core challenge of an Agentic AI system is managing the Reasoning Loop—the non-deterministic cycle where an LLM decides to use tools, observes results, and iterates.
Question 1: Synchronous vs. Asynchronous Execution? How do we handle agent tasks that might take 60+ seconds to complete due to multiple tool calls without timing out the HTTP connection? (Answer: Task Queue + WebSockets/Polling).
Question 2: State Management? How do we maintain the "Chain of Thought" and "Short-term Memory" across multiple reasoning steps? (Answer: Context Windows in Redis/DB).
Question 3: Tool Security? How do we prevent an agent from executing destructive commands or accessing unauthorized data? (Answer: Sandboxed Tool Execution Layer + RBAC).
Question 4: Loop Termination? How do we prevent infinite loops or "hallucination cycles"? (Answer: Max-iteration caps + LLM-based evaluation for "Goal Met" status).

Bonus Points

Token Budgeting & Rate Limiting: Implement a per-session token quota to prevent runaway costs from recursive loops.
Hierarchical Agentic Modeling: Utilizing a "Manager Agent" to decompose complex tasks into sub-tasks assigned to "Worker Agents" (Team of Agents pattern).
Observability (Tracing): Integration of OpenTelemetry with specialized traces for LLM "Spans" (input, output, latency, tool calls) for debugging reasoning failures.
Speculative Tool Execution: Predicting the next tool call and pre-fetching data to reduce perceived latency.
Design Breakdown

Functional Requirements

Users can submit natural language tasks (e.g., "Analyze this CSV and email the summary").
The system must decompose tasks into steps using an LLM.
The system must execute tools (Search, Python Code Sandbox, Database Access).
The system must maintain a history of interactions and reasoning steps.
The system must provide real-time updates on task progress.

Non-Functional Requirements

Reliability: Reasoning loops must be resume-able if a worker fails.
Scalability: Support horizontal scaling of workers to handle high-concurrency tool execution.
Security: Strict isolation of code execution (sandboxing).
Extensibility: Easy plug-and-play architecture for adding new tools.

Estimation

Throughput: 10,000 tasks per day.
Average Reasoning Steps: 5 steps per task.
Storage: ~50KB per task (logs + context). 10k * 50KB = 500MB/day (~180GB/year).
Compute: LLM latency is the bottleneck (1-5s per step). Workers are IO-bound waiting for LLM/Tools.

Blueprint

Concise Summary: An asynchronous, event-driven architecture where an API accepts tasks, a Task Queue manages the workload, and specialized Agent Workers execute the ReAct (Reason+Act) loop using external LLMs and sandboxed tools.
Major Components:
API Gateway: Handles client authentication and task submission.
Agent Orchestrator (Worker): The core logic that manages the LLM reasoning loop and state transitions.
Task Queue: Decouples the API from long-running reasoning processes.
State Store (Postgres): Persists the agent's memory, execution logs, and final results.
Tool Sandbox: A secure environment for executing code or API calls.
Simplicity Audit: This design avoids complex multi-agent frameworks (like AutoGen) in favor of a single-agent worker model to minimize orchestration overhead for the MVP.

High Level Architecture

Sub-system Deep Dive

Service

Topology & Scaling:
API Gateway: Stateless Node.js or FastAPI instances scaled horizontally.
Agent Worker: Python-based (due to LangChain/LlamaIndex ecosystem) workers scaled based on queue depth.
API Spec:
POST /v1/tasks: Submit a new task; returns task_id.
GET /v1/tasks/{id}: Polling for status and logs.
GET /v1/tasks/{id}/stream: WebSocket/SSE for real-time reasoning updates.

Storage

Data Model:
Tasks: {id, user_id, status, input_query, final_output, created_at}.
AgentLogs: {id, task_id, step_number, thought, tool_name, tool_input, tool_output}.
Database Logic:
Postgres: Used for durability. We index user_id and status.
Vector DB (Optional for MVP): If RAG is required, use pgvector to store and retrieve relevant documents.

Cache

Data Structures: Redis Hash for current session context.
TTLs: 24-hour TTL for active reasoning steps to prevent memory leaks.
Logic: Stores the current "Window" of the conversation to feed back into the LLM prompt for the next step in the loop.

Messaging

Topic Structure: agent_tasks (FIFO queue).
Delivery Guarantees: At-least-once delivery.
Consumers: Agent Workers subscribe to the queue, acknowledge the message only after the task reaches a terminal state (Success/Failure) or a checkpoint.
Wrap Up

Advanced Topics

Trade-offs:
Consistency vs. Latency: We sacrifice immediate consistency for user experience by using an asynchronous model. The user doesn't wait for the LLM; they watch it work.
State Management: Using Redis for context and Postgres for history adds complexity but ensures that a worker crash doesn't lose the entire reasoning chain.
Bottlenecks:
LLM Rate Limits: External providers often limit Tokens Per Minute (TPM).
Sequential Logic: The ReAct loop is naturally sequential, making it hard to parallelize a single task's reasoning.
Failure Handling:
Retry Logic: Exponential backoff for Tool/LLM failures.
Human-in-the-loop: For high-uncertainty tool calls, the worker transitions the task to PENDING_APPROVAL status.
Alternatives & Optimization:
Alternative: Instead of a custom Sandbox, use E2B or Modal for serverless code execution environments.
Optimization: Implement Semantic Caching (GPTCache) to skip the LLM reasoning for identical queries handled previously.