Agentic Guide to AI Agents

Welcome to the complete course on building AI agents from the ground up.

Course Overview

This comprehensive course takes you from foundational concepts to cutting-edge implementations in AI agent development. Whether you’re a beginner or an experienced developer, you’ll gain practical skills to build, deploy, and scale intelligent agents.

What You’ll Learn

Core concepts of AI agents and their architecture
Building agents with reasoning and tool-use capabilities
Advanced patterns including planning, memory, and multi-agent systems
Production deployment, testing, and monitoring
Specialized agent types for coding, research, and automation
Enterprise-scale architecture and security considerations
Latest research and emerging paradigms

Prerequisites

Basic Python programming
Understanding of APIs and HTTP
Familiarity with command line
Basic ML/AI concepts (helpful but not required)

Learning Path

Beginner: Chapters 1-2 (2-3 weeks)
Intermediate: Chapters 3-5 (4-6 weeks)
Advanced: Chapters 6-9 (6-8 weeks)
Expert: Module 10 + Research (ongoing)

Estimated Time

Total: 12-16 weeks for complete mastery with hands-on projects throughout.

Let’s begin your journey to mastering AI agents!

Prerequisites

Required Knowledge

Programming Fundamentals

Python proficiency: Functions, classes, decorators, async/await
Data structures: Lists, dicts, sets, queues
Error handling: Try/except, custom exceptions
File I/O: Reading/writing files

Basic Concepts

APIs: REST APIs, HTTP methods, JSON
Command line: Basic bash/terminal commands
Git: Version control basics
Environment variables: Configuration management

Recommended (Not Required)

Machine learning basics
Natural language processing concepts
Docker/containerization
Cloud platforms (AWS, Azure, GCP)

Technical Requirements

Software

Python 3.9+: Download
pip: Package manager (comes with Python)
Git: Download
Code editor: VS Code, PyCharm, or similar
Terminal: Command line access

Accounts

OpenAI API key: Get key
- Or Anthropic, AWS Bedrock, etc.
GitHub account: For version control
Optional: Cloud provider account (AWS, GCP, Azure)

Hardware

Minimum: 8GB RAM, modern CPU
Recommended: 16GB RAM, GPU for local models
Internet: Stable connection for API calls

Setup Instructions

1. Install Python

# Check Python version
python --version  # Should be 3.9+

# Create virtual environment
python -m venv venv

# Activate (macOS/Linux)
source venv/bin/activate

# Activate (Windows)
venv\Scripts\activate

2. Install Core Libraries

pip install openai langchain chromadb fastapi uvicorn pytest

3. Configure API Keys

# Create .env file
echo "OPENAI_API_KEY=your-key-here" > .env

# Or export directly
export OPENAI_API_KEY="your-key-here"

4. Verify Setup

# test_setup.py
from openai import OpenAI

client = OpenAI()
response = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": "Hello!"}]
)
print("✓ Setup successful!")
print(response.choices[0].message.content)

Time Commitment

Total course: 40-60 hours
Per chapter: 4-6 hours
Capstone project: 10-15 hours

Recommended pace: 2-3 chapters per week

Learning Path

Beginner Track (Start Here)

Module 1: Foundations
Module 2: Building Your First Agent
Module 4: Agent Tools & Capabilities
Module 5: Production-Ready Agents

Intermediate Track

Module 3: Advanced Agent Patterns
Module 6: Specialized Agent Types
Module 7: Advanced Topics

Advanced Track

Module 8: Enterprise & Scale
Module 9: Cutting-Edge Research
Module 10: Capstone Project

Getting Help

GitHub Issues: Report errors or ask questions
Discussions: Share projects and get feedback
Community: Join Discord/Slack communities (see Resources)

Ready to Start?

If you meet the prerequisites, you’re ready to begin! Start with the Introduction and then dive into Module 1.

About This Course

Author

Kyaw Mong is a software engineer and AI practitioner with extensive experience building production AI systems. This course distills years of hands-on experience into a comprehensive learning path for aspiring agent developers.

Course Philosophy

This course is built on three principles:

1. Learn by Building Every concept is accompanied by working code examples. You’ll build real agents, not just read about them.

2. Production-First We don’t just teach toy examples. You’ll learn reliability, testing, monitoring, and deployment—everything needed for production systems.

3. Comprehensive Coverage From foundations to frontier research, this course covers the full spectrum of agent development in 21,000+ lines of detailed content.

What Makes This Course Different

Complete working code: Every example runs and can be deployed
Real-world focus: Patterns used in production systems
Cutting-edge content: Latest research and techniques
Hands-on capstone: Build a complete autonomous agent
Free and open source: Available to everyone

Course Structure

The course follows a carefully designed progression:

Foundations (Chapters 1-2): Core concepts and first agent Intermediate (Chapters 3-5): Advanced patterns and production readiness
Advanced (Chapters 6-8): Specialized agents and enterprise scale Expert (Chapters 9-10): Research frontiers and capstone project

Acknowledgments

This course builds on the incredible work of the AI research community. Special thanks to:

OpenAI, Anthropic, and other AI labs for advancing the field
LangChain, AutoGPT, and framework creators
The open source community
Researchers publishing papers and sharing knowledge

Version History

v1.0 (February 2026)

Initial release
10 complete chapters
Autonomous Software Engineering Agent capstone
21,000+ lines of content

Contact & Feedback

GitHub: ekyawthan/ai-agents-course
Issues: Report errors or suggest improvements
Discussions: Share your projects and ask questions

License

This course is released under the MIT License. You’re free to use, modify, and share the content with attribution.

Ready to start learning? Head to Prerequisites to get set up!

Frequently Asked Questions

Getting Started

Which LLM should I use?

For learning: Start with OpenAI’s GPT-3.5-turbo

Affordable ($0.50-2 per million tokens)
Fast responses
Good function calling support

For production: Consider:

GPT-4: Best reasoning, higher cost
Claude 3: Long context (200K tokens), excellent for complex tasks
AWS Bedrock: Enterprise features, multiple models
Open source (Llama, Mistral): Self-hosted, no API costs

How much does it cost to run agents?

Development (100 requests/day):

GPT-3.5: ~$5-10/month
GPT-4: ~$30-50/month

Production (10K requests/day):

GPT-3.5: ~$500-1000/month
GPT-4: ~$3000-5000/month

Cost optimization:

Use caching (50-70% reduction)
Smaller models for simple tasks
Batch requests when possible

Do I need a GPU?

No for most agent development:

API-based LLMs run in the cloud
Your code just makes HTTP requests

Yes if you want to:

Run local models (Llama, Mistral)
Fine-tune models
Process large batches offline

Can I use this commercially?

Yes, but check:

LLM provider terms (OpenAI, Anthropic allow commercial use)
Open source licenses for frameworks
Data privacy regulations (GDPR, etc.)
Your specific use case compliance needs

Technical Questions

How do I handle rate limits?

from tenacity import retry, wait_exponential, stop_after_attempt

@retry(
    wait=wait_exponential(multiplier=1, min=4, max=60),
    stop=stop_after_attempt(5)
)
def call_llm(prompt):
    return client.chat.completions.create(...)

How do I reduce latency?

Streaming: Stream responses as they generate
Caching: Cache repeated queries
Smaller models: Use GPT-3.5 for simple tasks
Parallel calls: Run independent calls concurrently
Prompt optimization: Shorter prompts = faster responses

How do I prevent hallucinations?

Require tool use: Force agents to use tools, not memory
Validation: Verify outputs before using them
Lower temperature: Use 0.2-0.3 for factual tasks
Structured outputs: Use JSON mode or function calling
Retrieval: Use RAG to ground responses in facts

How do I debug agent failures?

Log everything: All thoughts, actions, observations
Trace execution: Use tools like LangSmith
Test incrementally: Start simple, add complexity
Validate tools: Test tools independently
Check prompts: Ensure clear instructions

Architecture Questions

Single agent vs multi-agent?

Single agent when:

Task is focused and well-defined
Simplicity is important
Low latency is critical

Multi-agent when:

Task requires diverse expertise
Parallel processing helps
Checks and balances needed
Scaling beyond single agent

How do I handle long-running tasks?

Async processing: Use background jobs
Checkpointing: Save state periodically
Progress updates: Stream status to user
Timeouts: Set reasonable limits
Resumability: Allow restart from checkpoint

How do I scale to production?

Horizontal scaling: Multiple agent instances
Load balancing: Distribute requests
Caching: Redis for responses
Queue systems: RabbitMQ, SQS for async tasks
Monitoring: Track performance and errors

Safety & Security

How do I make agents safe?

Sandboxing: Isolate code execution (Docker)
Validation: Check all inputs and outputs
Rate limiting: Prevent abuse
Human approval: For critical actions
Audit logging: Track all actions
Guardrails: Block harmful requests

What about prompt injection?

Defense strategies:

Input sanitization: Remove suspicious patterns
Separate contexts: User input vs system instructions
Output validation: Check for unexpected behavior
Monitoring: Detect anomalies
Least privilege: Limit tool access

How do I handle sensitive data?

Encryption: Encrypt data at rest and in transit
Access control: Role-based permissions
Data minimization: Only collect what’s needed
Anonymization: Remove PII when possible
Compliance: Follow GDPR, HIPAA, etc.

Development Questions

Which framework should I use?

LangChain: Best for rapid prototyping

Lots of integrations
Active community
Good documentation

LangGraph: Best for complex workflows

Graph-based state management
Better control flow
Production-ready

Custom: Best for specific needs

Full control
No framework overhead
Optimized for your use case

How do I test agents?

Unit tests: Test individual components
Integration tests: Test agent workflows
Evaluation sets: Benchmark on standard tasks
A/B testing: Compare agent versions
User testing: Real-world feedback

How long does it take to build an agent?

Simple agent (ReAct with 3-5 tools): 1-2 days Production agent (with testing, monitoring): 1-2 weeks Complex multi-agent system: 1-3 months Enterprise deployment: 3-6 months

Common Issues

“My agent gets stuck in loops”

Solutions:

Set max_steps limit
Add loop detection
Improve prompts to avoid repetition
Use planning instead of pure ReAct

“Tool calls fail frequently”

Solutions:

Validate tool schemas
Add retry logic with exponential backoff
Improve tool descriptions
Test tools independently
Add error handling

“Agent is too slow”

Solutions:

Use faster models (GPT-3.5 vs GPT-4)
Enable streaming
Cache repeated queries
Optimize prompts (shorter = faster)
Run tools in parallel

“Costs are too high”

Solutions:

Cache aggressively
Use smaller models when possible
Optimize prompt length
Batch requests
Set usage limits

Learning Path

I’m a beginner programmer. Can I take this course?

You need:

Python basics (functions, classes)
API concepts
Command line comfort

If you’re missing these, spend 2-4 weeks on Python fundamentals first, then return to this course.

Should I take this course or learn LangChain first?

Take this course if you want to:

Understand agent fundamentals
Build from scratch
Know what’s happening under the hood

Learn LangChain first if you want to:

Build quickly with existing tools
Focus on applications, not internals

Ideally: Take this course, then use frameworks with deeper understanding.

How do I stay current with agent research?

Follow researchers: Twitter/X, blogs
Read papers: ArXiv, conferences
Join communities: Discord, Reddit
Experiment: Try new techniques
Contribute: Open source projects

Still Have Questions?

GitHub Discussions: Ask the community
Issues: Report problems
Contributing: Improve the course

What Are AI Agents?

Module 1: Learning Objectives

By the end of this module, you will:

✓ Define what AI agents are and how they differ from traditional software
✓ Identify different types of agents and their use cases
✓ Understand the perception-reasoning-action loop
✓ Explain how LLMs enable agentic behavior
✓ Recognize key components of agent architecture

Definition and Core Concepts

An AI agent is an autonomous system that perceives its environment, reasons about it, and takes actions to achieve specific goals. Unlike simple chatbots that respond to queries, agents can:

Break down complex tasks into steps
Use tools and external resources
Remember context across interactions
Adapt their approach based on feedback
Work independently toward objectives

Think of an agent as a digital assistant that doesn’t just answer questions—it gets things done.

Agent vs. Chatbot vs. Assistant

Chatbot

Responds to direct queries
Stateless or minimal memory
No tool use
Example: Simple FAQ bot

Assistant

Helps with tasks through conversation
Maintains conversation context
May access some information
Example: Basic voice assistants

Agent

Autonomous task execution
Multi-step reasoning and planning
Uses multiple tools and APIs
Adapts strategy based on results
Example: Research agent that searches, analyzes, and synthesizes information

Autonomy, Reasoning, and Tool Use

Autonomy

Agents operate with varying degrees of independence:

Supervised: Requires approval for each action
Semi-autonomous: Asks for guidance on critical decisions
Fully autonomous: Executes complete workflows independently

Reasoning

Agents think through problems using:

Chain-of-thought: Step-by-step logical reasoning
Planning: Breaking goals into sub-tasks
Reflection: Evaluating their own outputs
Error recovery: Adapting when things go wrong

Tool Use

Modern agents extend their capabilities through tools:

Web search and browsing
Code execution
Database queries
API calls
File operations
Calculator and data analysis

Real-World Applications and Use Cases

Software Development

Code generation and refactoring
Bug detection and fixing
Documentation writing
Test generation

Research and Analysis

Literature reviews
Market research
Competitive analysis
Data synthesis

Business Automation

Customer support
Data entry and processing
Report generation
Workflow orchestration

Personal Productivity

Email management
Calendar scheduling
Travel planning
Information gathering

Creative Work

Content creation
Design assistance
Brainstorming
Editing and refinement

Key Characteristics of Effective Agents

Goal-oriented: Clear objectives drive behavior
Adaptive: Adjust approach based on feedback
Transparent: Explain reasoning and actions
Reliable: Handle errors gracefully
Efficient: Minimize unnecessary steps
Safe: Respect boundaries and constraints

The Agent Loop

At their core, agents follow a continuous cycle:

graph LR
    A[Perceive] --> B[Reason]
    B --> C[Act]
    C --> D[Observe]
    D --> A
    style A fill:#dbeafe
    style B fill:#fef3c7
    style C fill:#d1fae5
    style D fill:#e0e7ff

The Perception-Reasoning-Action Loop:

Perceive → Observe the current state
Reason → Decide what to do next
Act → Execute the chosen action
Observe → See the results
Repeat → Continue until goal is achieved

This loop enables agents to navigate complex, multi-step tasks that would be difficult to hardcode.

What Makes Agents Possible Now?

Recent advances have made practical agents feasible:

Large Language Models: Provide reasoning and language understanding
Function Calling: LLMs can reliably invoke tools with structured parameters
Context Windows: Models can maintain longer conversations and more context
Improved Reliability: Better instruction following and fewer hallucinations
Ecosystem: Frameworks and tools for building agents quickly

💡 Key Insight

The combination of LLMs with tool-calling capabilities is what makes modern AI agents fundamentally different from previous approaches. LLMs provide the “reasoning engine” while tools provide the “hands” to interact with the world.

Looking Ahead

As you progress through this course, you’ll learn to build agents that combine these concepts into practical, production-ready systems. We’ll start simple and gradually add sophistication.

✅ Key Takeaways

AI agents are autonomous systems that perceive, reason, and act to achieve goals

Agents differ from chatbots by using tools, planning, and maintaining memory

The perception-reasoning-action loop is the core pattern

Modern LLMs enable practical agent development through reasoning and tool use

Agents can be simple (single-task) or complex (multi-agent systems)

In the next section, we’ll explore agent architecture and how these components fit together.

Agent Architecture Basics

The Perception-Reasoning-Action Loop

Every agent operates on a fundamental cycle that mirrors how humans approach tasks:

┌─────────────┐
│  PERCEIVE   │ ← Gather information about current state
└──────┬──────┘
       │
       ▼
┌─────────────┐
│   REASON    │ ← Decide what to do next
└──────┬──────┘
       │
       ▼
┌─────────────┐
│    ACT      │ ← Execute the chosen action
└──────┬──────┘
       │
       └──────→ (back to PERCEIVE)

Perceive

The agent observes its environment:

User input and instructions
Tool outputs and results
Current state and context
Available resources

Reason

The agent decides on the next action:

Analyze the current situation
Consider available options
Plan the next step
Evaluate potential outcomes

Act

The agent executes its decision:

Call a tool or function
Generate a response
Update internal state
Request more information

Memory Systems

Agents need memory to maintain context and learn from experience. There are two primary types:

Short-Term Memory (Working Memory)

Holds information for the current task:

Conversation history: Recent messages and responses
Intermediate results: Outputs from previous steps
Current plan: What the agent is trying to accomplish
Execution state: Where the agent is in the workflow

Implementation: Typically stored in the LLM’s context window

Limitations:

Fixed size (token limits)
Cleared when task completes
Can become cluttered

Long-Term Memory (Persistent Memory)

Retains information across sessions:

Facts and knowledge: Learned information about the user or domain
Past interactions: Historical conversations
Successful strategies: What worked before
User preferences: Personalization data

Implementation:

Vector databases (semantic search)
Traditional databases (structured data)
File systems (documents, logs)

Key Operations:

Store: Save important information
Retrieve: Find relevant past information
Update: Modify existing memories
Forget: Remove outdated information

Planning and Goal-Oriented Behavior

Agents don’t just react—they plan ahead to achieve goals efficiently.

Goal Decomposition

Breaking complex goals into manageable sub-goals:

Goal: "Research and summarize recent AI papers"
  ├─ Sub-goal 1: Search for relevant papers
  ├─ Sub-goal 2: Read and extract key points
  ├─ Sub-goal 3: Synthesize findings
  └─ Sub-goal 4: Format summary

Planning Strategies

Reactive Planning: Decide next step based on current state

Simple and fast
Good for straightforward tasks
Limited lookahead

Proactive Planning: Create full plan upfront, then execute

Better for complex tasks
Can optimize entire workflow
May need replanning if things change

Hybrid Planning: Plan a few steps ahead, adapt as needed

Balances flexibility and efficiency
Most common in practice

Plan Representation

Plans can be represented as:

Linear sequences: Step 1 → Step 2 → Step 3
Trees: Branching based on conditions
Graphs: Complex dependencies between steps
Natural language: Human-readable descriptions

Multi-Step Task Execution

Agents excel at tasks requiring multiple actions:

Execution Patterns

Sequential Execution

Step 1 → Step 2 → Step 3 → Done

Each step depends on the previous one.

Parallel Execution

Step 1a ─┐
Step 1b ─┼→ Combine → Done
Step 1c ─┘

Independent steps run simultaneously.

Conditional Execution

Step 1 → Decision
         ├─ If A → Step 2a → Done
         └─ If B → Step 2b → Done

Path depends on intermediate results.

Iterative Execution

Step 1 → Step 2 → Check
         ↑         │
         └─────────┘ (repeat if needed)

Loop until condition is met.

Error Handling

Robust agents handle failures gracefully:

Detect: Recognize when something went wrong
Diagnose: Understand the cause
Recover: Try alternative approaches
Escalate: Ask for help if stuck

Progress Tracking

Agents monitor their progress:

Checkpoints: Mark completed sub-goals
State management: Track what’s been done
Backtracking: Undo steps if needed
Resumption: Continue after interruption

Core Components of Agent Architecture

1. Controller (Brain)

The central decision-making component:

Interprets user goals
Manages the reasoning loop
Coordinates other components
Handles control flow

2. Memory Manager

Manages information storage and retrieval:

Maintains conversation context
Stores and retrieves long-term memories
Decides what to remember/forget
Optimizes memory usage

3. Tool Interface

Connects agent to external capabilities:

Defines available tools
Handles tool invocation
Parses tool outputs
Manages tool errors

4. Planner

Develops strategies for achieving goals:

Decomposes complex tasks
Generates action sequences
Optimizes execution order
Adapts plans based on results

5. Executor

Carries out planned actions:

Invokes tools with correct parameters
Monitors execution
Collects results
Reports status

Putting It Together

A complete agent architecture integrates these components:

User Input
    ↓
┌─────────────────────────────────┐
│         CONTROLLER              │
│  (Orchestrates everything)      │
└────┬────────────────────────┬───┘
     │                        │
     ▼                        ▼
┌─────────┐              ┌─────────┐
│ MEMORY  │←────────────→│ PLANNER │
└─────────┘              └────┬────┘
     ↑                        │
     │                        ▼
     │                   ┌─────────┐
     └──────────────────│EXECUTOR │
                        └────┬────┘
                             │
                             ▼
                        ┌─────────┐
                        │  TOOLS  │
                        └─────────┘
                             ↓
                         Results

Design Principles

When architecting agents, follow these principles:

Modularity: Separate concerns into distinct components
Observability: Make agent reasoning transparent
Flexibility: Allow easy addition of new tools and capabilities
Robustness: Handle errors and edge cases gracefully
Efficiency: Minimize unnecessary steps and API calls
Safety: Validate inputs and outputs, respect boundaries

Next Steps

Now that you understand the basic architecture, we’ll explore how LLMs power these components in the next section on LLM Fundamentals for Agents.

LLM Fundamentals for Agents

How Language Models Work

Large Language Models (LLMs) are the “brain” of modern AI agents. Understanding how they work helps you build better agents.

The Basics

LLMs are trained to predict the next token (word or word piece) given previous tokens:

Input:  "The capital of France is"
Output: "Paris" (most likely next token)

This simple mechanism enables:

Text generation
Question answering
Reasoning
Code generation
Tool use

From Prediction to Reasoning

Modern LLMs don’t just predict—they reason:

Chain-of-Thought: Breaking down problems step by step

Question: "If I have 3 apples and buy 2 more, then give away 1, how many do I have?"

LLM reasoning:
1. Start with 3 apples
2. Buy 2 more: 3 + 2 = 5
3. Give away 1: 5 - 1 = 4
Answer: 4 apples

Tool Use: Recognizing when to call external functions

User: "What's the weather in Tokyo?"
LLM: I should use the weather_api tool with location="Tokyo"

Key Capabilities for Agents

Instruction following: Understanding and executing commands
Context understanding: Maintaining awareness of conversation history
Function calling: Invoking tools with correct parameters
Error recovery: Adapting when things go wrong
Self-reflection: Evaluating own outputs

Prompting Strategies for Agents

How you prompt an LLM dramatically affects agent performance.

System Prompts

Define the agent’s role, capabilities, and constraints:

You are a research assistant agent. Your goal is to help users 
find and synthesize information from multiple sources.

Available tools:
- web_search(query): Search the internet
- read_url(url): Extract content from a webpage
- summarize(text): Create concise summaries

Always:
1. Break complex requests into steps
2. Verify information from multiple sources
3. Cite your sources
4. Ask for clarification if needed

Few-Shot Examples

Show the agent how to behave through examples:

Example 1:
User: "Find recent news about AI"
Agent: I'll search for recent AI news.
Action: web_search("AI news 2026")
Result: [search results]
Agent: Here are the top 3 recent AI developments...

Example 2:
User: "What's on that page?"
Agent: I need a URL to read a page. Could you provide the link?

ReAct Pattern

The most common prompting pattern for agents:

Thought: What do I need to do?
Action: [tool_name](parameters)
Observation: [result from tool]
Thought: What does this mean?
Action: [next tool or final answer]

Structured Outputs

Guide the LLM to produce consistent formats:

Respond in this format:
{
  "reasoning": "Your thought process",
  "action": "tool_name",
  "parameters": {"param": "value"},
  "confidence": 0.95
}

Context Windows and Token Limits

Every LLM has a maximum context window—the amount of text it can process at once.

Common Context Sizes

GPT-4: 8K, 32K, 128K tokens
Claude: 200K tokens
Gemini: 1M+ tokens

What Fits in Context?

Approximate token counts:

1 token ≈ 4 characters
1 token ≈ 0.75 words
100 tokens ≈ 75 words
1,000 tokens ≈ 750 words

Context Management Strategies

1. Summarization Compress old conversation history:

[Full conversation history]
    ↓
[Summary of key points] + [Recent messages]

2. Sliding Window Keep only the most recent N messages:

Message 1, 2, 3, 4, 5, 6, 7, 8
                    └─────────┘ (keep last 4)

3. Selective Retention Keep important messages, discard routine ones:

System prompt + Key decisions + Recent context

4. External Memory Store information outside context, retrieve as needed:

Context: [Current task]
Memory DB: [All past information]
           ↓ (retrieve relevant)
Context: [Current task] + [Relevant memories]

Token Budget Management

For agents, allocate tokens wisely:

System prompt:     500 tokens
Tools definition:  1,000 tokens
Conversation:      5,000 tokens
Working memory:    1,500 tokens
Reserve:           1,000 tokens (for response)
─────────────────────────────
Total:             9,000 tokens (fits in 8K with buffer)

Temperature, Top-p, and Sampling Parameters

These parameters control how the LLM generates text.

Temperature

Controls randomness (0.0 to 2.0):

Low temperature (0.0 - 0.3): Deterministic, focused

Temperature: 0.1
"The capital of France is Paris" (always)

Use for: Tool calling, structured tasks, factual responses

Medium temperature (0.5 - 0.8): Balanced

Temperature: 0.7
"The capital of France is Paris, a beautiful city known for..."

Use for: General agent behavior, conversational responses

High temperature (1.0 - 2.0): Creative, random

Temperature: 1.5
"The capital of France? Ah, the magnificent Paris, where..."

Use for: Creative tasks, brainstorming, diverse outputs

Top-p (Nucleus Sampling)

Controls diversity by probability mass (0.0 to 1.0):

Low top-p (0.1 - 0.5): Conservative choices

Considers only the most likely tokens
More focused and consistent

High top-p (0.9 - 1.0): Diverse choices

Considers a wider range of tokens
More varied and creative

Typical for agents: 0.9-0.95

Top-k

Limits to top K most likely tokens:

top-k=1: Always pick most likely (deterministic)
top-k=10: Choose from 10 most likely
top-k=50: More diversity

Practical Guidelines for Agents

For tool calling and structured tasks:

temperature = 0.1
top_p = 0.9

For conversational responses:

temperature = 0.7
top_p = 0.95

For creative tasks:

temperature = 1.0
top_p = 0.95

Other Important Parameters

Max Tokens

Maximum length of generated response:

Set based on expected output length
Leave room for tool calls and reasoning
Typical: 500-2000 for agent responses

Stop Sequences

Tokens that halt generation:

stop_sequences = ["</tool>", "DONE", "\n\nUser:"]

Useful for controlling agent output format.

Frequency/Presence Penalty

Reduce repetition:

Frequency penalty: Penalize tokens based on how often they appear
Presence penalty: Penalize tokens that have appeared at all
Typical: 0.0-0.5 for agents

Prompt Engineering Best Practices

1. Be Specific

❌ “Help me with this” ✅ “Search for recent papers on transformer architectures and summarize the key innovations”

2. Provide Context

You are helping a software engineer debug a Python application.
The user has intermediate Python knowledge.
Focus on practical solutions.

3. Use Delimiters

User input: """
{user_message}
"""

Available tools: ###
{tool_definitions}
###

4. Specify Output Format

Respond with:
1. Your reasoning
2. The action to take
3. Expected outcome

5. Handle Edge Cases

If the user's request is unclear, ask for clarification.
If a tool fails, try an alternative approach.
If you cannot complete the task, explain why.

Testing and Iteration

Evaluate Prompts Systematically

Create test cases: Common scenarios your agent should handle
Run experiments: Try different prompts and parameters
Measure performance: Success rate, quality, efficiency
Iterate: Refine based on results

Common Issues and Fixes

Issue: Agent doesn’t use tools Fix: Add explicit examples of tool usage

Issue: Agent is too verbose Fix: Lower temperature, add “be concise” instruction

Issue: Agent hallucinates Fix: Emphasize “only use provided tools”, add verification steps

Issue: Agent gets stuck in loops Fix: Add step counter, max iterations limit

Choosing the Right Model

Different models for different needs:

For Agents

GPT-4 / GPT-4 Turbo

Excellent reasoning
Reliable tool calling
Good for complex tasks

Claude 3 (Opus/Sonnet)

Long context (200K)
Strong reasoning
Good safety features

GPT-3.5 Turbo

Fast and cheap
Good for simple agents
Lower reasoning capability

Trade-offs

Cost vs. Capability: Stronger models cost more
Speed vs. Quality: Faster models may be less accurate
Context vs. Price: Longer context costs more

Next Steps

With these LLM fundamentals, you’re ready to build your first agent! In Chapter 2, we’ll implement a simple ReAct agent that puts these concepts into practice.

Simple ReAct Agent

Module 2: Learning Objectives

By the end of this module, you will:

✓ Implement a ReAct agent from scratch
✓ Integrate external tools with function calling
✓ Handle errors and retries gracefully
✓ Build a complete shopping research assistant
✓ Understand tool schemas and validation

Introduction to ReAct

ReAct (Reasoning + Acting) is the most popular pattern for building AI agents. It combines:

Reasoning: Thinking through what to do
Acting: Taking actions via tools

The agent alternates between thinking and acting until it solves the task.

The ReAct Pattern

graph TD
    A[User Query] --> B[Thought: Reason]
    B --> C{Need Tool?}
    C -->|Yes| D[Action: Use Tool]
    C -->|No| E[Answer: Respond]
    D --> F[Observation: Result]
    F --> B
    E --> G[Done]
    style B fill:#fef3c7
    style D fill:#d1fae5
    style F fill:#dbeafe
    style E fill:#f0fdf4

The ReAct Loop:

Thought: I need to figure out what to do
Action: tool_name(parameters)
Observation: [result from the tool]
Thought: Based on this result, I should...
Action: another_tool(parameters)
Observation: [another result]
Thought: Now I have enough information
Answer: [final response to user]

Why ReAct Works

Transparency: You can see the agent’s reasoning
Debuggability: Easy to identify where things go wrong
Flexibility: Works for many types of tasks
Simplicity: Easy to implement and understand

⚠️ Important

ReAct agents can get stuck in loops or make poor decisions. Always implement max step limits and validation to prevent runaway execution.

Building Your First ReAct Agent

Let’s build a simple agent step by step.

Step 1: Define the Agent Loop

def react_agent(user_input, max_steps=10):
    """Simple ReAct agent loop"""
    messages = [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": user_input}
    ]
    
    for step in range(max_steps):
        # Get LLM response
        response = llm.generate(messages)
        
        # Parse response
        if is_final_answer(response):
            return response
        
        # Execute action
        action, params = parse_action(response)
        result = execute_tool(action, params)
        
        # Add to conversation
        messages.append({"role": "assistant", "content": response})
        messages.append({"role": "user", "content": f"Observation: {result}"})
    
    return "Max steps reached"

Step 2: Create the System Prompt

SYSTEM_PROMPT = """You are a helpful AI agent that can use tools to answer questions.

Available tools:
- search(query): Search the internet for information
- calculate(expression): Evaluate mathematical expressions
- get_time(): Get the current time

Use this format:
Thought: [your reasoning about what to do]
Action: tool_name(parameters)

When you have the final answer:
Answer: [your response to the user]

Example:
User: What is 25 * 17?
Thought: I need to calculate this multiplication
Action: calculate("25 * 17")
Observation: 425
Thought: I have the result
Answer: 25 * 17 equals 425
"""

Step 3: Implement Tool Execution

def execute_tool(action, params):
    """Execute a tool and return the result"""
    tools = {
        "search": search_tool,
        "calculate": calculate_tool,
        "get_time": get_time_tool
    }
    
    if action not in tools:
        return f"Error: Unknown tool '{action}'"
    
    try:
        result = tools[action](params)
        return result
    except Exception as e:
        return f"Error: {str(e)}"

Step 4: Parse Agent Output

import re

def parse_action(response):
    """Extract action and parameters from agent response"""
    # Look for Action: tool_name(params)
    match = re.search(r'Action:\s*(\w+)\((.*?)\)', response)
    
    if match:
        action = match.group(1)
        params = match.group(2).strip('"\'')
        return action, params
    
    return None, None

def is_final_answer(response):
    """Check if response contains final answer"""
    return "Answer:" in response

Complete Working Example

import openai
import re
from datetime import datetime

# Initialize OpenAI
client = openai.OpenAI()

# System prompt
SYSTEM_PROMPT = """You are a helpful AI agent with access to tools.

Tools:
- calculate(expression): Evaluate math expressions
- get_time(): Get current time

Format:
Thought: [reasoning]
Action: tool_name(parameters)

When done:
Answer: [final response]
"""

# Tool implementations
def calculate_tool(expression):
    """Safely evaluate math expressions"""
    try:
        # Only allow safe operations
        allowed = set('0123456789+-*/()., ')
        if not all(c in allowed for c in expression):
            return "Error: Invalid characters in expression"
        return str(eval(expression))
    except Exception as e:
        return f"Error: {str(e)}"

def get_time_tool(_):
    """Get current time"""
    return datetime.now().strftime("%Y-%m-%d %H:%M:%S")

# Tool registry
TOOLS = {
    "calculate": calculate_tool,
    "get_time": get_time_tool
}

# Parsing functions
def parse_action(text):
    """Extract action from agent response"""
    match = re.search(r'Action:\s*(\w+)\((.*?)\)', text)
    if match:
        return match.group(1), match.group(2).strip('"\'')
    return None, None

def is_final_answer(text):
    """Check if agent provided final answer"""
    return "Answer:" in text

# Main agent loop
def react_agent(user_input, max_steps=10):
    """ReAct agent implementation"""
    messages = [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": user_input}
    ]
    
    print(f"User: {user_input}\n")
    
    for step in range(max_steps):
        # Get LLM response
        response = client.chat.completions.create(
            model="gpt-4",
            messages=messages,
            temperature=0.1
        )
        
        agent_response = response.choices[0].message.content
        print(f"Agent: {agent_response}\n")
        
        # Check if done
        if is_final_answer(agent_response):
            # Extract final answer
            answer = agent_response.split("Answer:")[1].strip()
            return answer
        
        # Parse and execute action
        action, params = parse_action(agent_response)
        
        if action and action in TOOLS:
            result = TOOLS[action](params)
            observation = f"Observation: {result}"
            print(f"{observation}\n")
            
            # Add to conversation
            messages.append({"role": "assistant", "content": agent_response})
            messages.append({"role": "user", "content": observation})
        else:
            return "Error: Could not parse action or unknown tool"
    
    return "Max steps reached without answer"

# Test the agent
if __name__ == "__main__":
    result = react_agent("What is 123 * 456?")
    print(f"Final Answer: {result}")

Thought-Action-Observation Cycles

Let’s trace through an example:

User: “What’s 15% of 240?”

Cycle 1:

Thought: I need to calculate 15% of 240, which is 0.15 * 240
Action: calculate("0.15 * 240")
Observation: 36.0

Cycle 2:

Thought: I have the result
Answer: 15% of 240 is 36

Multiple Steps Example

User: “What time is it and what’s 100 + 50?”

Cycle 1:

Thought: I need to get the current time first
Action: get_time()
Observation: 2026-02-24 11:19:00

Cycle 2:

Thought: Now I need to calculate 100 + 50
Action: calculate("100 + 50")
Observation: 150

Cycle 3:

Thought: I have both pieces of information
Answer: The current time is 2026-02-24 11:19:00, and 100 + 50 equals 150

Basic Tool Calling

Tool Definition

Define tools with clear descriptions:

TOOLS = {
    "search": {
        "function": search_tool,
        "description": "Search the internet for information",
        "parameters": {
            "query": "The search query string"
        }
    },
    "calculate": {
        "function": calculate_tool,
        "description": "Evaluate mathematical expressions",
        "parameters": {
            "expression": "Math expression to evaluate (e.g., '2 + 2')"
        }
    }
}

Tool Implementation Best Practices

Validate inputs: Check parameters before execution
Handle errors: Return error messages, don’t crash
Return strings: Consistent output format
Be deterministic: Same input → same output
Add timeouts: Prevent hanging operations

def search_tool(query):
    """Search tool with validation and error handling"""
    # Validate
    if not query or len(query) < 2:
        return "Error: Query too short"
    
    # Execute with timeout
    try:
        results = search_api(query, timeout=5)
        return format_results(results)
    except TimeoutError:
        return "Error: Search timed out"
    except Exception as e:
        return f"Error: {str(e)}"

Error Handling and Retries

Agents need to handle failures gracefully.

Detecting Errors

def is_error(observation):
    """Check if tool execution resulted in error"""
    return observation.startswith("Error:")

Retry Logic

def react_agent_with_retry(user_input, max_steps=10, max_retries=3):
    """ReAct agent with retry logic"""
    messages = [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": user_input}
    ]
    
    retry_count = 0
    
    for step in range(max_steps):
        response = get_llm_response(messages)
        
        if is_final_answer(response):
            return extract_answer(response)
        
        action, params = parse_action(response)
        result = execute_tool(action, params)
        
        # Handle errors
        if is_error(result) and retry_count < max_retries:
            retry_count += 1
            messages.append({
                "role": "user", 
                "content": f"{result}\nPlease try a different approach."
            })
            continue
        
        retry_count = 0  # Reset on success
        messages.append({"role": "assistant", "content": response})
        messages.append({"role": "user", "content": f"Observation: {result}"})
    
    return "Max steps reached"

Graceful Degradation

# Add to system prompt
"""
If a tool fails:
1. Try an alternative approach
2. If no alternative exists, explain the limitation
3. Provide the best answer you can with available information
"""

Common Pitfalls and Solutions

Pitfall 1: Infinite Loops

Problem: Agent repeats the same action Solution: Track action history, limit repetitions

action_history = []

if action in action_history[-3:]:  # Same action 3 times
    return "Agent stuck in loop, stopping"
    
action_history.append(action)

Pitfall 2: Hallucinated Tools

Problem: Agent invents non-existent tools Solution: Strict validation, clear error messages

if action not in TOOLS:
    observation = f"Error: Tool '{action}' does not exist. Available tools: {list(TOOLS.keys())}"

Pitfall 3: Malformed Actions

Problem: Agent doesn’t follow format Solution: Better prompting, examples, parsing fallbacks

# Add to system prompt
"""
IMPORTANT: Always use exact format:
Action: tool_name(parameters)

Incorrect: "I'll use search tool with query X"
Correct: Action: search("query X")
"""

Pitfall 4: Premature Answers

Problem: Agent answers before using tools Solution: Emphasize tool usage in prompt

"""
You MUST use tools to answer questions. Do not guess or use prior knowledge.
Always verify information using available tools.
"""

Testing Your Agent

# Test cases
test_cases = [
    ("What is 50 * 20?", "1000"),
    ("What time is it?", None),  # Time varies
    ("Calculate 100 / 4", "25"),
]

for question, expected in test_cases:
    result = react_agent(question)
    if expected:
        assert expected in result, f"Failed: {question}"
    print(f"✓ {question}")

💡 Pro Tip

Start with simple test cases and gradually increase complexity. Log all reasoning traces to understand how your agent makes decisions.

✅ Key Takeaways

ReAct combines reasoning (thinking) with acting (tool use)

The agent alternates between Thought, Action, and Observation

Always implement max steps to prevent infinite loops

Use structured prompts to guide agent behavior

Validate tool calls before execution

Common pitfalls: loops, hallucinations, premature answers

Next Steps

You now have a working ReAct agent! In the next section, we’ll explore tool integration in depth, including:

Function calling APIs
Complex tool schemas
Parameter validation
Response parsing strategies

Tool Integration

Function Calling APIs

Modern LLMs support native function calling, making tool integration more reliable than text parsing.

OpenAI Function Calling

import openai

client = openai.OpenAI()

# Define tools
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "City name, e.g., 'San Francisco'"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "Temperature unit"
                    }
                },
                "required": ["location"]
            }
        }
    }
]

# Call LLM with tools
response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
    tools=tools,
    tool_choice="auto"  # Let model decide when to use tools
)

# Check if model wants to call a function
message = response.choices[0].message
if message.tool_calls:
    tool_call = message.tool_calls[0]
    function_name = tool_call.function.name
    arguments = json.loads(tool_call.function.arguments)
    print(f"Calling: {function_name}({arguments})")

Anthropic Tool Use

import anthropic

client = anthropic.Anthropic()

tools = [
    {
        "name": "get_weather",
        "description": "Get current weather for a location",
        "input_schema": {
            "type": "object",
            "properties": {
                "location": {
                    "type": "string",
                    "description": "City name"
                }
            },
            "required": ["location"]
        }
    }
]

response = client.messages.create(
    model="claude-3-opus-20240229",
    max_tokens=1024,
    tools=tools,
    messages=[{"role": "user", "content": "Weather in Paris?"}]
)

# Check for tool use
for block in response.content:
    if block.type == "tool_use":
        print(f"Tool: {block.name}")
        print(f"Input: {block.input}")

Benefits of Native Function Calling

Structured output: JSON instead of text parsing
Type safety: Parameters validated by LLM
Reliability: Less prone to format errors
Parallel calls: Multiple tools at once

Tool Schemas and Descriptions

Good tool definitions are critical for agent performance.

Anatomy of a Tool Schema

{
    "name": "search_database",  # Clear, descriptive name
    "description": "Search the product database for items matching criteria. Returns up to 10 results.",  # When and why to use
    "parameters": {
        "type": "object",
        "properties": {
            "query": {
                "type": "string",
                "description": "Search query (e.g., 'red shoes size 10')"
            },
            "category": {
                "type": "string",
                "enum": ["electronics", "clothing", "books"],
                "description": "Product category to search within"
            },
            "max_price": {
                "type": "number",
                "description": "Maximum price in USD"
            }
        },
        "required": ["query"]  # Only query is mandatory
    }
}

Writing Effective Descriptions

Bad: “Search function” Good: “Search the product database for items. Use when user asks about products, availability, or prices.”

Bad: “Gets data” Good: “Retrieve user profile data including name, email, and preferences. Use for personalization or account queries.”

Description Best Practices

Be specific: Explain exactly what the tool does
Include examples: Show typical parameter values
State limitations: Mention constraints or edge cases
Clarify use cases: When should this tool be used?
Avoid ambiguity: Use precise language

# Good example
{
    "name": "calculate_shipping",
    "description": """Calculate shipping cost for an order.
    
    Use when: User asks about shipping costs or delivery fees
    Returns: Cost in USD and estimated delivery days
    Limitations: Only works for US addresses
    
    Example: calculate_shipping(weight=2.5, zip_code="94102")
    """,
    "parameters": {
        "type": "object",
        "properties": {
            "weight": {
                "type": "number",
                "description": "Package weight in pounds (e.g., 2.5)"
            },
            "zip_code": {
                "type": "string",
                "description": "5-digit US ZIP code (e.g., '94102')"
            }
        },
        "required": ["weight", "zip_code"]
    }
}

Parameter Validation

Always validate parameters before execution.

Basic Validation

def validate_parameters(tool_name, params):
    """Validate tool parameters"""
    validators = {
        "search": validate_search,
        "calculate": validate_calculate,
        "send_email": validate_email
    }
    
    if tool_name not in validators:
        return False, f"Unknown tool: {tool_name}"
    
    return validators[tool_name](params)

def validate_search(params):
    """Validate search parameters"""
    if "query" not in params:
        return False, "Missing required parameter: query"
    
    if not isinstance(params["query"], str):
        return False, "Query must be a string"
    
    if len(params["query"]) < 2:
        return False, "Query too short (minimum 2 characters)"
    
    if len(params["query"]) > 200:
        return False, "Query too long (maximum 200 characters)"
    
    return True, "Valid"

Type Validation

def validate_type(value, expected_type):
    """Validate parameter type"""
    type_map = {
        "string": str,
        "number": (int, float),
        "boolean": bool,
        "array": list,
        "object": dict
    }
    
    expected = type_map.get(expected_type)
    if not isinstance(value, expected):
        return False, f"Expected {expected_type}, got {type(value).__name__}"
    
    return True, "Valid"

Schema-Based Validation

import jsonschema

def validate_with_schema(params, schema):
    """Validate parameters against JSON schema"""
    try:
        jsonschema.validate(instance=params, schema=schema)
        return True, "Valid"
    except jsonschema.ValidationError as e:
        return False, str(e)

# Example usage
schema = {
    "type": "object",
    "properties": {
        "email": {
            "type": "string",
            "format": "email"
        },
        "age": {
            "type": "integer",
            "minimum": 0,
            "maximum": 150
        }
    },
    "required": ["email"]
}

valid, message = validate_with_schema(
    {"email": "user@example.com", "age": 25},
    schema
)

Sanitization

Clean inputs before use:

def sanitize_string(s, max_length=1000):
    """Sanitize string input"""
    # Remove null bytes
    s = s.replace('\x00', '')
    
    # Trim whitespace
    s = s.strip()
    
    # Limit length
    s = s[:max_length]
    
    return s

def sanitize_sql_input(s):
    """Prevent SQL injection"""
    # Use parameterized queries instead
    # This is just for demonstration
    dangerous = ["'", '"', ';', '--', '/*', '*/']
    for char in dangerous:
        s = s.replace(char, '')
    return s

Response Parsing

Handle tool outputs consistently.

Structured Responses

from dataclasses import dataclass
from typing import Optional

@dataclass
class ToolResponse:
    """Standardized tool response"""
    success: bool
    data: Optional[dict] = None
    error: Optional[str] = None
    metadata: Optional[dict] = None

def execute_tool(tool_name, params):
    """Execute tool and return structured response"""
    try:
        result = TOOLS[tool_name](params)
        return ToolResponse(
            success=True,
            data=result,
            metadata={"tool": tool_name, "timestamp": time.time()}
        )
    except Exception as e:
        return ToolResponse(
            success=False,
            error=str(e),
            metadata={"tool": tool_name}
        )

Formatting for LLM

def format_tool_response(response: ToolResponse) -> str:
    """Format tool response for LLM consumption"""
    if response.success:
        return f"Success: {json.dumps(response.data, indent=2)}"
    else:
        return f"Error: {response.error}"

# Usage in agent loop
result = execute_tool("search", {"query": "AI agents"})
observation = format_tool_response(result)
messages.append({"role": "user", "content": f"Observation: {observation}"})

Handling Different Response Types

def parse_tool_output(output, expected_type="string"):
    """Parse and validate tool output"""
    if expected_type == "json":
        try:
            return json.loads(output)
        except json.JSONDecodeError:
            return {"error": "Invalid JSON response"}
    
    elif expected_type == "number":
        try:
            return float(output)
        except ValueError:
            return None
    
    elif expected_type == "boolean":
        return output.lower() in ["true", "yes", "1"]
    
    else:  # string
        return str(output)

Building a Tool Registry

Organize tools for easy management.

Simple Registry

class ToolRegistry:
    """Manage available tools"""
    
    def __init__(self):
        self.tools = {}
    
    def register(self, name, function, schema):
        """Register a new tool"""
        self.tools[name] = {
            "function": function,
            "schema": schema
        }
    
    def get_tool(self, name):
        """Get tool by name"""
        return self.tools.get(name)
    
    def list_tools(self):
        """List all available tools"""
        return list(self.tools.keys())
    
    def get_schemas(self):
        """Get all tool schemas for LLM"""
        return [tool["schema"] for tool in self.tools.values()]
    
    def execute(self, name, params):
        """Execute a tool"""
        tool = self.get_tool(name)
        if not tool:
            raise ValueError(f"Tool not found: {name}")
        
        return tool["function"](params)

# Usage
registry = ToolRegistry()

# Register tools
registry.register(
    name="search",
    function=search_function,
    schema={
        "name": "search",
        "description": "Search the web",
        "parameters": {...}
    }
)

# Use in agent
schemas = registry.get_schemas()
result = registry.execute("search", {"query": "AI"})

Advanced Registry with Decorators

class ToolRegistry:
    def __init__(self):
        self.tools = {}
    
    def tool(self, name, description, parameters):
        """Decorator to register tools"""
        def decorator(func):
            self.tools[name] = {
                "function": func,
                "schema": {
                    "name": name,
                    "description": description,
                    "parameters": parameters
                }
            }
            return func
        return decorator

# Create registry
registry = ToolRegistry()

# Register tools with decorator
@registry.tool(
    name="calculate",
    description="Evaluate mathematical expressions",
    parameters={
        "type": "object",
        "properties": {
            "expression": {"type": "string"}
        },
        "required": ["expression"]
    }
)
def calculate(expression):
    """Calculate mathematical expression"""
    return eval(expression)

@registry.tool(
    name="get_time",
    description="Get current time",
    parameters={"type": "object", "properties": {}}
)
def get_time():
    """Get current time"""
    from datetime import datetime
    return datetime.now().isoformat()

Complete Tool Integration Example

import openai
import json
from typing import Dict, Any, List

class Agent:
    """Agent with integrated tool system"""
    
    def __init__(self, model="gpt-4"):
        self.client = openai.OpenAI()
        self.model = model
        self.registry = ToolRegistry()
        self._register_default_tools()
    
    def _register_default_tools(self):
        """Register built-in tools"""
        @self.registry.tool(
            name="search",
            description="Search for information",
            parameters={
                "type": "object",
                "properties": {
                    "query": {"type": "string"}
                },
                "required": ["query"]
            }
        )
        def search(query):
            # Implement search
            return f"Search results for: {query}"
        
        @self.registry.tool(
            name="calculate",
            description="Evaluate math expressions",
            parameters={
                "type": "object",
                "properties": {
                    "expression": {"type": "string"}
                },
                "required": ["expression"]
            }
        )
        def calculate(expression):
            try:
                return str(eval(expression))
            except Exception as e:
                return f"Error: {str(e)}"
    
    def run(self, user_input: str, max_steps: int = 10) -> str:
        """Run agent with tool integration"""
        messages = [
            {"role": "system", "content": "You are a helpful assistant with access to tools."},
            {"role": "user", "content": user_input}
        ]
        
        for step in range(max_steps):
            # Call LLM with tools
            response = self.client.chat.completions.create(
                model=self.model,
                messages=messages,
                tools=self.registry.get_schemas(),
                tool_choice="auto"
            )
            
            message = response.choices[0].message
            messages.append(message)
            
            # Check if done
            if not message.tool_calls:
                return message.content
            
            # Execute tool calls
            for tool_call in message.tool_calls:
                function_name = tool_call.function.name
                arguments = json.loads(tool_call.function.arguments)
                
                # Execute tool
                result = self.registry.execute(function_name, arguments)
                
                # Add result to messages
                messages.append({
                    "role": "tool",
                    "tool_call_id": tool_call.id,
                    "content": str(result)
                })
        
        return "Max steps reached"

# Usage
agent = Agent()
response = agent.run("What is 25 * 17?")
print(response)

Best Practices

Clear naming: Use descriptive, unambiguous tool names
Comprehensive descriptions: Help the LLM understand when to use each tool
Validate everything: Check parameters before execution
Handle errors gracefully: Return useful error messages
Keep tools focused: One tool, one purpose
Document examples: Show typical usage in descriptions
Version your tools: Track changes to tool interfaces
Test thoroughly: Verify tools work with various inputs

Common Patterns

Conditional Tool Access

def get_available_tools(user_role):
    """Return tools based on user permissions"""
    base_tools = ["search", "calculate"]
    
    if user_role == "admin":
        base_tools.extend(["delete_data", "modify_settings"])
    
    return [registry.get_tool(name) for name in base_tools]

Tool Chaining

# Tools can call other tools
@registry.tool(name="research", ...)
def research(topic):
    # Search for information
    results = registry.execute("search", {"query": topic})
    
    # Summarize results
    summary = registry.execute("summarize", {"text": results})
    
    return summary

Async Tool Execution

import asyncio

async def execute_tool_async(tool_name, params):
    """Execute tool asynchronously"""
    tool = registry.get_tool(tool_name)
    return await tool["function"](params)

# Execute multiple tools in parallel
results = await asyncio.gather(
    execute_tool_async("search", {"query": "AI"}),
    execute_tool_async("search", {"query": "ML"}),
    execute_tool_async("search", {"query": "agents"})
)

Next Steps

Now that you understand tool integration, let’s build a complete hands-on project in the next section where you’ll create a research assistant agent with multiple tools!

Hands-On Project: Shopping Research Assistant

Project Overview

Build a Shopping Research Assistant that helps users make informed purchasing decisions by:

Searching for products across multiple sources
Comparing prices and features
Reading product reviews
Summarizing pros and cons
Providing recommendations with reasoning

This project combines everything you’ve learned: ReAct pattern, tool integration, multi-step reasoning, and error handling.

What You’ll Build

An agent that can handle queries like:

“Find the best laptop under $1000 for programming”
“Compare noise-canceling headphones”
“What are the top-rated coffee makers?”
“Should I buy the iPhone 15 or Samsung S24?”

Project Setup

Dependencies

pip install openai requests beautifulsoup4 python-dotenv

Project Structure

shopping_agent/
├── agent.py           # Main agent implementation
├── tools.py           # Tool definitions
├── config.py          # Configuration
├── .env              # API keys
└── test_agent.py     # Test cases

Configuration

# config.py
import os
from dotenv import load_dotenv

load_dotenv()

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
MODEL = "gpt-4"
MAX_STEPS = 15
TEMPERATURE = 0.7

Implement the Tools

Tool 1: Product Search

# tools.py
import requests
from typing import Dict, List

def search_products(query: str, max_results: int = 5) -> str:
    """
    Search for products matching the query.
    Returns product names, prices, and URLs.
    """
    try:
        # Using a mock API for demonstration
        # In production, use real APIs like Amazon Product API, eBay, etc.
        
        # Simulate search results
        results = [
            {
                "name": f"Product {i+1} for {query}",
                "price": f"${100 + i*50}",
                "rating": f"{4.0 + i*0.2:.1f}/5.0",
                "url": f"https://example.com/product-{i+1}"
            }
            for i in range(max_results)
        ]
        
        # Format results
        output = f"Found {len(results)} products:\n\n"
        for i, product in enumerate(results, 1):
            output += f"{i}. {product['name']}\n"
            output += f"   Price: {product['price']}\n"
            output += f"   Rating: {product['rating']}\n"
            output += f"   URL: {product['url']}\n\n"
        
        return output
    
    except Exception as e:
        return f"Error searching products: {str(e)}"


def search_products_real(query: str, max_results: int = 5) -> str:
    """
    Real implementation using web search.
    Searches Google Shopping or similar.
    """
    try:
        # Example with Google Custom Search API
        api_key = os.getenv("GOOGLE_API_KEY")
        search_engine_id = os.getenv("GOOGLE_SEARCH_ENGINE_ID")
        
        url = "https://www.googleapis.com/customsearch/v1"
        params = {
            "key": api_key,
            "cx": search_engine_id,
            "q": query + " buy price",
            "num": max_results
        }
        
        response = requests.get(url, params=params, timeout=10)
        response.raise_for_status()
        
        data = response.json()
        items = data.get("items", [])
        
        output = f"Found {len(items)} products:\n\n"
        for i, item in enumerate(items, 1):
            output += f"{i}. {item['title']}\n"
            output += f"   {item['snippet']}\n"
            output += f"   URL: {item['link']}\n\n"
        
        return output
    
    except Exception as e:
        return f"Error: {str(e)}"

Tool 2: Get Product Details

from bs4 import BeautifulSoup

def get_product_details(url: str) -> str:
    """
    Extract detailed information from a product page.
    Returns specs, description, and reviews summary.
    """
    try:
        headers = {
            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
        }
        
        response = requests.get(url, headers=headers, timeout=10)
        response.raise_for_status()
        
        soup = BeautifulSoup(response.content, 'html.parser')
        
        # Extract text (simplified)
        # In production, use specific selectors for each site
        text = soup.get_text(separator='\n', strip=True)
        
        # Limit length
        max_length = 2000
        if len(text) > max_length:
            text = text[:max_length] + "..."
        
        return f"Product details from {url}:\n\n{text}"
    
    except Exception as e:
        return f"Error fetching product details: {str(e)}"

Tool 3: Compare Products

def compare_products(product_list: str) -> str:
    """
    Compare multiple products based on provided information.
    Input: Comma-separated product names or descriptions.
    Returns: Comparison table.
    """
    try:
        products = [p.strip() for p in product_list.split(',')]
        
        if len(products) < 2:
            return "Error: Need at least 2 products to compare"
        
        output = "Product Comparison:\n\n"
        output += "To compare these products effectively, I need their details.\n"
        output += "Please use get_product_details for each product first.\n\n"
        output += f"Products to compare: {', '.join(products)}"
        
        return output
    
    except Exception as e:
        return f"Error comparing products: {str(e)}"

Tool 4: Get Reviews Summary

def get_reviews_summary(product_name: str) -> str:
    """
    Get a summary of customer reviews for a product.
    Returns common pros, cons, and overall sentiment.
    """
    try:
        # Mock implementation
        # In production, scrape from Amazon, Reddit, review sites
        
        reviews = {
            "overall_rating": "4.3/5.0",
            "total_reviews": 1247,
            "pros": [
                "Excellent build quality",
                "Great performance",
                "Good value for money"
            ],
            "cons": [
                "Battery life could be better",
                "Slightly heavy",
                "Limited color options"
            ],
            "common_themes": [
                "Users love the performance",
                "Some complaints about weight",
                "Generally recommended"
            ]
        }
        
        output = f"Reviews Summary for {product_name}:\n\n"
        output += f"Overall Rating: {reviews['overall_rating']} ({reviews['total_reviews']} reviews)\n\n"
        output += "Pros:\n"
        for pro in reviews['pros']:
            output += f"  ✓ {pro}\n"
        output += "\nCons:\n"
        for con in reviews['cons']:
            output += f"  ✗ {con}\n"
        output += "\nCommon Themes:\n"
        for theme in reviews['common_themes']:
            output += f"  • {theme}\n"
        
        return output
    
    except Exception as e:
        return f"Error getting reviews: {str(e)}"

Tool 5: Price History

def get_price_history(product_name: str) -> str:
    """
    Get price history and trends for a product.
    Helps determine if current price is good.
    """
    try:
        # Mock implementation
        # In production, use CamelCamelCamel API, Keepa, etc.
        
        history = {
            "current_price": "$899",
            "lowest_price": "$799 (3 months ago)",
            "highest_price": "$999 (6 months ago)",
            "average_price": "$879",
            "trend": "stable",
            "recommendation": "Current price is close to average. Good time to buy."
        }
        
        output = f"Price History for {product_name}:\n\n"
        output += f"Current Price: {history['current_price']}\n"
        output += f"Lowest Price: {history['lowest_price']}\n"
        output += f"Highest Price: {history['highest_price']}\n"
        output += f"Average Price: {history['average_price']}\n"
        output += f"Trend: {history['trend']}\n\n"
        output += f"💡 {history['recommendation']}"
        
        return output
    
    except Exception as e:
        return f"Error getting price history: {str(e)}"

Build the Agent

Tool Registry

# agent.py
from tools import (
    search_products,
    get_product_details,
    compare_products,
    get_reviews_summary,
    get_price_history
)

class ShoppingAgent:
    """Shopping Research Assistant Agent"""
    
    def __init__(self):
        self.tools = self._create_tool_schemas()
        self.client = openai.OpenAI()
    
    def _create_tool_schemas(self):
        """Define tool schemas for OpenAI function calling"""
        return [
            {
                "type": "function",
                "function": {
                    "name": "search_products",
                    "description": "Search for products matching a query. Use when user asks to find or search for products.",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "query": {
                                "type": "string",
                                "description": "Product search query (e.g., 'laptop under $1000')"
                            },
                            "max_results": {
                                "type": "integer",
                                "description": "Maximum number of results (default: 5)",
                                "default": 5
                            }
                        },
                        "required": ["query"]
                    }
                }
            },
            {
                "type": "function",
                "function": {
                    "name": "get_product_details",
                    "description": "Get detailed information about a specific product from its URL.",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "url": {
                                "type": "string",
                                "description": "Product page URL"
                            }
                        },
                        "required": ["url"]
                    }
                }
            },
            {
                "type": "function",
                "function": {
                    "name": "get_reviews_summary",
                    "description": "Get summary of customer reviews including pros, cons, and ratings.",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "product_name": {
                                "type": "string",
                                "description": "Product name"
                            }
                        },
                        "required": ["product_name"]
                    }
                }
            },
            {
                "type": "function",
                "function": {
                    "name": "get_price_history",
                    "description": "Get price history and determine if current price is good.",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "product_name": {
                                "type": "string",
                                "description": "Product name"
                            }
                        },
                        "required": ["product_name"]
                    }
                }
            },
            {
                "type": "function",
                "function": {
                    "name": "compare_products",
                    "description": "Compare multiple products. Use after gathering details about each product.",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "product_list": {
                                "type": "string",
                                "description": "Comma-separated list of product names"
                            }
                        },
                        "required": ["product_list"]
                    }
                }
            }
        ]
    
    def _execute_tool(self, tool_name: str, arguments: dict) -> str:
        """Execute a tool and return result"""
        tool_map = {
            "search_products": search_products,
            "get_product_details": get_product_details,
            "compare_products": compare_products,
            "get_reviews_summary": get_reviews_summary,
            "get_price_history": get_price_history
        }
        
        if tool_name not in tool_map:
            return f"Error: Unknown tool {tool_name}"
        
        try:
            result = tool_map[tool_name](**arguments)
            return result
        except Exception as e:
            return f"Error executing {tool_name}: {str(e)}"
    
    def run(self, user_query: str, max_steps: int = 15) -> str:
        """Run the shopping assistant agent"""
        messages = [
            {
                "role": "system",
                "content": """You are a helpful shopping research assistant. 
                
Your goal is to help users make informed purchasing decisions by:
1. Searching for relevant products
2. Gathering detailed information and reviews
3. Comparing options
4. Providing clear recommendations with reasoning

Always:
- Search for products before making recommendations
- Check reviews and ratings
- Consider price history when available
- Compare multiple options when relevant
- Cite specific information from your research
- Be honest about limitations

Format your final recommendation clearly with pros, cons, and reasoning."""
            },
            {"role": "user", "content": user_query}
        ]
        
        print(f"🛍️  User: {user_query}\n")
        
        for step in range(max_steps):
            # Get LLM response
            response = self.client.chat.completions.create(
                model="gpt-4",
                messages=messages,
                tools=self.tools,
                tool_choice="auto",
                temperature=0.7
            )
            
            message = response.choices[0].message
            
            # If no tool calls, we're done
            if not message.tool_calls:
                print(f"🤖 Assistant: {message.content}\n")
                return message.content
            
            # Add assistant message
            messages.append(message)
            
            # Execute tool calls
            for tool_call in message.tool_calls:
                function_name = tool_call.function.name
                arguments = json.loads(tool_call.function.arguments)
                
                print(f"🔧 Using tool: {function_name}({arguments})")
                
                # Execute tool
                result = self._execute_tool(function_name, arguments)
                print(f"📊 Result: {result[:200]}...\n")
                
                # Add tool result to messages
                messages.append({
                    "role": "tool",
                    "tool_call_id": tool_call.id,
                    "content": result
                })
        
        return "⚠️  Max steps reached without completing the task"

Complete Implementation

# agent.py (complete file)
import openai
import json
from config import OPENAI_API_KEY, MODEL
from tools import (
    search_products,
    get_product_details,
    compare_products,
    get_reviews_summary,
    get_price_history
)

openai.api_key = OPENAI_API_KEY

# [ShoppingAgent class from above]

def main():
    """Test the shopping agent"""
    agent = ShoppingAgent()
    
    # Example queries
    queries = [
        "Find the best noise-canceling headphones under $300",
        "Compare iPhone 15 Pro and Samsung Galaxy S24",
        "What's a good coffee maker for home use?"
    ]
    
    for query in queries:
        print("=" * 60)
        result = agent.run(query)
        print("=" * 60)
        print()

if __name__ == "__main__":
    main()

Test Cases

# test_agent.py
from agent import ShoppingAgent

def test_product_search():
    """Test basic product search"""
    agent = ShoppingAgent()
    result = agent.run("Find wireless keyboards under $50")
    assert "Product" in result or "keyboard" in result.lower()
    print("✓ Product search test passed")

def test_comparison():
    """Test product comparison"""
    agent = ShoppingAgent()
    result = agent.run("Compare MacBook Air vs Dell XPS 13")
    assert len(result) > 100  # Should have substantial response
    print("✓ Comparison test passed")

def test_reviews():
    """Test review gathering"""
    agent = ShoppingAgent()
    result = agent.run("What do people say about AirPods Pro?")
    assert "review" in result.lower() or "rating" in result.lower()
    print("✓ Reviews test passed")

if __name__ == "__main__":
    test_product_search()
    test_comparison()
    test_reviews()
    print("\n✅ All tests passed!")

Debug Common Issues

Issue 1: Agent Doesn’t Use Tools

Problem: Agent responds without searching

Solution: Strengthen system prompt

"You MUST use the search_products tool before making any recommendations.
Never rely on prior knowledge about products or prices."

Issue 2: Infinite Search Loop

Problem: Agent keeps searching without concluding

Solution: Add step tracking and guidance

# Track tool usage
tool_usage = {}
if tool_name in tool_usage:
    tool_usage[tool_name] += 1
    if tool_usage[tool_name] > 3:
        return "You've used this tool multiple times. Please synthesize your findings."

Issue 3: Hallucinated Product Info

Problem: Agent invents product details

Solution: Emphasize tool-only information

"CRITICAL: Only use information from tool results. 
If a tool doesn't return information, say so explicitly.
Never make up product names, prices, or specifications."

Issue 4: Poor Recommendations

Problem: Recommendations lack depth

Solution: Add structured output requirement

"Format your final recommendation as:

**Recommendation**: [Product name]

**Why**: [2-3 key reasons]

**Pros**:
- [Pro 1]
- [Pro 2]

**Cons**:
- [Con 1]
- [Con 2]

**Price**: [Current price and value assessment]"

Enhancements

1. Add Budget Tracking

def check_budget(price: str, budget: float) -> bool:
    """Check if price is within budget"""
    # Extract numeric price
    price_num = float(price.replace('$', '').replace(',', ''))
    return price_num <= budget

2. Save Research Sessions

def save_research(query: str, results: str):
    """Save research for later reference"""
    with open(f"research_{timestamp}.txt", "w") as f:
        f.write(f"Query: {query}\n\n{results}")

3. Multi-Store Price Comparison

def compare_prices_across_stores(product: str) -> dict:
    """Check prices at Amazon, Walmart, Best Buy, etc."""
    stores = ["Amazon", "Walmart", "Best Buy"]
    prices = {}
    for store in stores:
        prices[store] = search_store_price(store, product)
    return prices

4. Deal Alerts

def check_for_deals(product: str) -> str:
    """Check if product is on sale or has coupons"""
    # Check deal sites, coupon codes, etc.
    pass

5. Personalization

def get_user_preferences() -> dict:
    """Load user preferences (brands, price range, features)"""
    return {
        "preferred_brands": ["Sony", "Apple"],
        "max_price": 500,
        "must_have_features": ["wireless", "noise-canceling"]
    }

Practice Exercises

Exercise 1: Add a New Tool (Easy)

Task: Add a compare_prices tool that compares prices across products.

Requirements:

Takes a list of products with prices
Returns the cheapest option
Handles missing price data

Click to see solution

def compare_prices(products: List[Dict]) -> Dict:
    """Compare prices and find cheapest"""
    
    valid_products = [p for p in products if "price" in p]
    
    if not valid_products:
        return {"error": "No products with prices"}
    
    cheapest = min(valid_products, key=lambda x: x["price"])
    
    return {
        "cheapest": cheapest,
        "savings": valid_products[0]["price"] - cheapest["price"]
    }

Exercise 2: Improve Error Handling (Medium)

Task: Enhance the agent to handle API timeouts and retries.

Requirements:

Retry failed tool calls up to 3 times
Use exponential backoff
Log all retry attempts

Click to see solution

import time

def execute_tool_with_retry(tool_name: str, args: dict, max_retries: int = 3):
    """Execute tool with retry logic"""
    
    for attempt in range(max_retries):
        try:
            result = execute_tool(tool_name, args)
            return result
        except Exception as e:
            if attempt == max_retries - 1:
                raise
            
            wait_time = 2 ** attempt  # Exponential backoff
            print(f"Retry {attempt + 1}/{max_retries} after {wait_time}s")
            time.sleep(wait_time)

Exercise 3: Build a Travel Agent (Hard)

Task: Create a travel planning agent with these tools:

search_flights(origin, destination, date)
search_hotels(location, checkin, checkout)
get_weather(location, date)
calculate_budget(flights, hotels, days)

Challenge: Agent should create a complete travel plan with budget.

Click to see solution

class TravelAgent:
    def __init__(self):
        self.client = openai.OpenAI()
        self.tools = [
            {
                "name": "search_flights",
                "description": "Search for flights",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "origin": {"type": "string"},
                        "destination": {"type": "string"},
                        "date": {"type": "string"}
                    },
                    "required": ["origin", "destination", "date"]
                }
            },
            # Add other tools...
        ]
    
    def plan_trip(self, request: str) -> Dict:
        """Plan complete trip"""
        
        messages = [{"role": "user", "content": request}]
        
        for _ in range(10):
            response = self.client.chat.completions.create(
                model="gpt-4",
                messages=messages,
                tools=self.tools
            )
            
            message = response.choices[0].message
            
            if message.tool_calls:
                # Execute tools and continue
                for tool_call in message.tool_calls:
                    result = self.execute_tool(tool_call)
                    messages.append({
                        "role": "tool",
                        "content": json.dumps(result),
                        "tool_call_id": tool_call.id
                    })
            else:
                return {"plan": message.content}
        
        return {"error": "Max steps reached"}

✅ Key Takeaways

ReAct agents combine reasoning with tool use

Tool integration requires clear schemas and validation

Error handling and retries improve reliability

Real-world agents need multiple specialized tools

Practice builds intuition for agent design

Next Steps

Congratulations! You’ve built a complete shopping research assistant. You now understand:

✅ ReAct pattern implementation
✅ Tool integration and validation
✅ Multi-step reasoning
✅ Error handling and debugging
✅ Real-world agent applications

In Chapter 3, we’ll explore advanced agent patterns including planning, memory systems, and multi-agent collaboration!

Planning Agents

Module 3: Learning Objectives

By the end of this module, you will:

✓ Implement planning algorithms (Chain-of-Thought, task decomposition)
✓ Build memory systems (short-term, long-term, semantic)
✓ Create multi-agent systems with collaboration patterns
✓ Understand when to use planning vs reactive approaches
✓ Design agent communication protocols

Introduction to Planning

Simple ReAct agents decide one step at a time. Planning agents think ahead—they create a multi-step plan before executing, leading to more efficient and coherent task completion.

Why Planning Matters

graph TB
    subgraph "Without Planning"
    A1[Search flights] --> A2[Search hotels]
    A2 --> A3[Dates don't match!]
    A3 --> A4[Search flights again]
    A4 --> A5[Search hotels again]
    end
    
    subgraph "With Planning"
    B1[Plan all steps] --> B2[Determine dates]
    B2 --> B3[Search flights]
    B3 --> B4[Search hotels]
    B4 --> B5[Done efficiently]
    end
    
    style A3 fill:#fee2e2
    style B5 fill:#d1fae5

Without Planning (Reactive):

Search flights → Search hotels → Dates mismatch → Redo everything
Inefficient, multiple retries

With Planning (Proactive):

Plan: dates → flights → hotels → booking
Execute efficiently in one pass

⚠️ When to Use Planning

Use planning for:

Multi-step tasks with dependencies

Tasks requiring coordination

Resource-constrained scenarios

Skip planning for:

Simple single-step tasks

Highly dynamic environments

When speed is critical

Chain-of-Thought Reasoning

Chain-of-Thought (CoT) prompting encourages step-by-step reasoning.

Basic CoT

SYSTEM_PROMPT = """When solving problems, think step by step:

1. Understand the problem
2. Break it into sub-problems
3. Solve each sub-problem
4. Combine solutions

Example:
User: "I need to prepare for a camping trip next weekend"

Thought: Let me break this down:
1. Determine what items are needed for camping
2. Check what the user already has
3. Create a shopping list for missing items
4. Suggest where to buy them

Now I'll execute this plan..."""

Zero-Shot CoT

Simply add “Let’s think step by step”:

def zero_shot_cot(query):
    """Use zero-shot chain of thought"""
    prompt = f"{query}\n\nLet's think step by step:"
    return llm.generate(prompt)

Few-Shot CoT

Provide examples of step-by-step reasoning:

FEW_SHOT_EXAMPLES = """
Example 1:
User: "Plan a birthday party for 20 people"
Reasoning:
1. Determine budget and venue
2. Create guest list (20 people)
3. Choose date and send invitations
4. Plan menu and order food
5. Arrange entertainment and decorations
6. Prepare day-of schedule

Example 2:
User: "Debug why my website is slow"
Reasoning:
1. Measure current performance metrics
2. Identify bottlenecks (database, network, code)
3. Prioritize issues by impact
4. Fix highest-impact issues first
5. Re-measure to verify improvements
"""

Task Decomposition

Breaking complex tasks into manageable subtasks.

Hierarchical Decomposition

def decompose_task(task: str) -> dict:
    """Decompose task into hierarchy"""
    prompt = f"""Break down this task into subtasks:

Task: {task}

Format as:
Main Goal: [goal]
Subtasks:
1. [subtask 1]
   1.1 [sub-subtask]
   1.2 [sub-subtask]
2. [subtask 2]
3. [subtask 3]
"""
    
    response = llm.generate(prompt)
    return parse_task_hierarchy(response)

# Example output
{
    "goal": "Launch a new product",
    "subtasks": [
        {
            "id": 1,
            "task": "Market research",
            "subtasks": [
                {"id": 1.1, "task": "Identify target audience"},
                {"id": 1.2, "task": "Analyze competitors"}
            ]
        },
        {
            "id": 2,
            "task": "Product development"
        },
        {
            "id": 3,
            "task": "Marketing campaign"
        }
    ]
}

Dependency-Aware Decomposition

class Task:
    def __init__(self, name, dependencies=None):
        self.name = name
        self.dependencies = dependencies or []
        self.status = "pending"

def create_task_graph(goal: str) -> List[Task]:
    """Create task graph with dependencies"""
    tasks = [
        Task("Research market", dependencies=[]),
        Task("Design product", dependencies=["Research market"]),
        Task("Build prototype", dependencies=["Design product"]),
        Task("Test prototype", dependencies=["Build prototype"]),
        Task("Launch", dependencies=["Test prototype", "Marketing ready"])
    ]
    return tasks

def get_executable_tasks(tasks: List[Task]) -> List[Task]:
    """Get tasks that can be executed now"""
    return [
        task for task in tasks
        if task.status == "pending" and
        all(dep.status == "completed" for dep in task.dependencies)
    ]

Plan-and-Execute Frameworks

Separate planning from execution for better control.

Basic Plan-and-Execute

class PlanExecuteAgent:
    """Agent that plans first, then executes"""
    
    def __init__(self):
        self.client = openai.OpenAI()
        self.tools = self._load_tools()
    
    def plan(self, goal: str) -> List[str]:
        """Create execution plan"""
        prompt = f"""Create a detailed plan to accomplish this goal:

Goal: {goal}

Available tools: {', '.join(self.tools.keys())}

Provide a numbered list of steps. Each step should:
- Be specific and actionable
- Use available tools
- Build on previous steps

Plan:"""
        
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.3
        )
        
        plan_text = response.choices[0].message.content
        steps = self._parse_plan(plan_text)
        return steps
    
    def execute(self, steps: List[str]) -> str:
        """Execute plan steps"""
        results = []
        
        for i, step in enumerate(steps, 1):
            print(f"Executing step {i}: {step}")
            
            # Use ReAct agent to execute each step
            result = self._execute_step(step)
            results.append(result)
            
            # Check if we should continue
            if self._should_replan(result):
                print("Replanning needed...")
                remaining = steps[i:]
                new_plan = self.plan(f"Complete: {', '.join(remaining)}")
                steps = steps[:i] + new_plan
        
        return self._synthesize_results(results)
    
    def run(self, goal: str) -> str:
        """Plan and execute"""
        print(f"Goal: {goal}\n")
        
        # Create plan
        plan = self.plan(goal)
        print("Plan:")
        for i, step in enumerate(plan, 1):
            print(f"  {i}. {step}")
        print()
        
        # Execute plan
        result = self.execute(plan)
        return result

Example Usage

agent = PlanExecuteAgent()

result = agent.run(
    "Research electric cars under $40k and create a comparison report"
)

# Output:
# Goal: Research electric cars under $40k and create a comparison report
# 
# Plan:
#   1. Search for electric cars priced under $40,000
#   2. Get detailed specs for top 5 models
#   3. Compare range, charging time, and features
#   4. Check customer reviews for each model
#   5. Create structured comparison report
#
# Executing step 1: Search for electric cars...
# Executing step 2: Get detailed specs...
# ...

Replanning and Adaptation

Plans often need adjustment based on results.

When to Replan

def should_replan(step_result: str, original_plan: List[str]) -> bool:
    """Determine if replanning is needed"""
    
    # Error occurred
    if "error" in step_result.lower():
        return True
    
    # Unexpected result
    if "not found" in step_result.lower():
        return True
    
    # New information changes approach
    if "alternative" in step_result.lower():
        return True
    
    return False

Replanning Strategies

1. Full Replan: Start over with new information

def full_replan(goal: str, context: str) -> List[str]:
    """Create entirely new plan"""
    prompt = f"""Original goal: {goal}
    
Context from execution so far:
{context}

Create a new plan considering this context:"""
    
    return create_plan(prompt)

2. Partial Replan: Adjust remaining steps

def partial_replan(remaining_steps: List[str], issue: str) -> List[str]:
    """Adjust remaining steps"""
    prompt = f"""We encountered an issue: {issue}
    
Remaining steps were:
{format_steps(remaining_steps)}

Adjust the plan to work around this issue:"""
    
    return create_plan(prompt)

3. Alternative Path: Try different approach

def find_alternative(failed_step: str, goal: str) -> str:
    """Find alternative way to accomplish step"""
    prompt = f"""This step failed: {failed_step}
    
Goal: {goal}

Suggest an alternative approach:"""
    
    return llm.generate(prompt)

Adaptive Planning Agent

class AdaptivePlanningAgent:
    """Agent that adapts plan based on execution"""
    
    def __init__(self, max_replans=3):
        self.max_replans = max_replans
        self.replan_count = 0
    
    def execute_with_adaptation(self, goal: str) -> str:
        """Execute with adaptive replanning"""
        plan = self.plan(goal)
        context = []
        
        i = 0
        while i < len(plan):
            step = plan[i]
            
            # Execute step
            result = self.execute_step(step)
            context.append({"step": step, "result": result})
            
            # Check if replanning needed
            if self.should_replan(result):
                if self.replan_count >= self.max_replans:
                    return "Max replans reached. Unable to complete goal."
                
                # Replan remaining steps
                remaining_goal = self.extract_remaining_goal(plan[i+1:])
                new_steps = self.replan(remaining_goal, context)
                
                # Update plan
                plan = plan[:i+1] + new_steps
                self.replan_count += 1
                
                print(f"🔄 Replanned ({self.replan_count}/{self.max_replans})")
            
            i += 1
        
        return self.synthesize_results(context)

Plan Representation

Different ways to represent plans.

Linear Plan

plan = [
    "Step 1: Search for products",
    "Step 2: Compare prices",
    "Step 3: Read reviews",
    "Step 4: Make recommendation"
]

Tree Plan

plan = {
    "root": "Research product",
    "branches": [
        {
            "node": "Gather information",
            "branches": [
                {"node": "Search products"},
                {"node": "Get specifications"}
            ]
        },
        {
            "node": "Analyze",
            "branches": [
                {"node": "Compare features"},
                {"node": "Check reviews"}
            ]
        },
        {
            "node": "Recommend"}
    ]
}

Graph Plan

from dataclasses import dataclass
from typing import List, Set

@dataclass
class PlanNode:
    id: str
    action: str
    dependencies: Set[str]
    status: str = "pending"

plan_graph = [
    PlanNode("1", "Search products", set()),
    PlanNode("2", "Get details A", {"1"}),
    PlanNode("3", "Get details B", {"1"}),
    PlanNode("4", "Compare", {"2", "3"}),
    PlanNode("5", "Recommend", {"4"})
]

def get_ready_nodes(graph: List[PlanNode]) -> List[PlanNode]:
    """Get nodes ready to execute"""
    completed = {n.id for n in graph if n.status == "completed"}
    
    return [
        node for node in graph
        if node.status == "pending" and
        node.dependencies.issubset(completed)
    ]

Advanced Planning Techniques

Backward Chaining

Start from goal and work backwards:

def backward_chain(goal: str, current_state: dict) -> List[str]:
    """Plan by working backwards from goal"""
    
    plan = []
    current_goal = goal
    
    while not is_satisfied(current_goal, current_state):
        # What's needed to achieve current_goal?
        prerequisite = find_prerequisite(current_goal)
        plan.insert(0, prerequisite)
        current_goal = prerequisite
    
    return plan

# Example
goal = "Have dinner ready"
# Backward chain:
# "Have dinner ready" requires "Food is cooked"
# "Food is cooked" requires "Ingredients prepared"
# "Ingredients prepared" requires "Groceries purchased"
# Plan: [Buy groceries, Prepare ingredients, Cook food]

Hierarchical Task Network (HTN)

class HTNPlanner:
    """Hierarchical Task Network planner"""
    
    def __init__(self):
        self.methods = {
            "travel_to_city": [
                ["book_flight", "take_flight"],
                ["book_train", "take_train"],
                ["rent_car", "drive"]
            ],
            "book_flight": [
                ["search_flights", "select_flight", "pay"]
            ]
        }
    
    def decompose(self, task: str) -> List[str]:
        """Decompose high-level task"""
        if task not in self.methods:
            return [task]  # Primitive task
        
        # Choose best method
        method = self.select_method(task)
        
        # Recursively decompose
        plan = []
        for subtask in method:
            plan.extend(self.decompose(subtask))
        
        return plan

Monte Carlo Tree Search (MCTS) for Planning

class MCTSPlanner:
    """Use MCTS to find optimal plan"""
    
    def plan(self, goal: str, num_simulations: int = 100):
        """Find plan using MCTS"""
        root = Node(state=initial_state, goal=goal)
        
        for _ in range(num_simulations):
            # Selection
            node = self.select(root)
            
            # Expansion
            if not node.is_terminal():
                node = self.expand(node)
            
            # Simulation
            reward = self.simulate(node)
            
            # Backpropagation
            self.backpropagate(node, reward)
        
        # Return best path
        return self.best_path(root)

Practical Planning Agent

class PracticalPlanningAgent:
    """Production-ready planning agent"""
    
    def __init__(self):
        self.client = openai.OpenAI()
        self.max_steps = 20
        self.max_replans = 3
    
    def run(self, goal: str) -> str:
        """Execute goal with planning"""
        
        # 1. Create initial plan
        plan = self.create_plan(goal)
        print("📋 Initial Plan:")
        for i, step in enumerate(plan, 1):
            print(f"   {i}. {step}")
        print()
        
        # 2. Execute with monitoring
        results = []
        replan_count = 0
        
        for i, step in enumerate(plan):
            print(f"▶️  Step {i+1}/{len(plan)}: {step}")
            
            # Execute step
            result = self.execute_step(step, results)
            results.append({"step": step, "result": result})
            
            # Check success
            if self.is_failure(result):
                if replan_count >= self.max_replans:
                    return self.handle_failure(goal, results)
                
                # Replan
                print(f"⚠️  Step failed, replanning...")
                new_plan = self.replan(goal, plan[i+1:], results)
                plan = plan[:i+1] + new_plan
                replan_count += 1
            
            print(f"✓ Completed\n")
        
        # 3. Synthesize final result
        return self.synthesize(goal, results)
    
    def create_plan(self, goal: str) -> List[str]:
        """Create execution plan"""
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[{
                "role": "user",
                "content": f"""Create a step-by-step plan for: {goal}

Requirements:
- Each step should be specific and actionable
- Steps should build on each other logically
- Include verification steps
- Keep it concise (max 10 steps)

Format as numbered list."""
            }],
            temperature=0.3
        )
        
        return self.parse_plan(response.choices[0].message.content)

Best Practices

Plan at the right level: Not too detailed, not too vague
Include verification: Check if steps succeeded
Be flexible: Allow replanning when needed
Consider dependencies: Respect task ordering
Set limits: Max steps, max replans
Monitor progress: Track what’s completed
Learn from failures: Improve planning over time

✅ Key Takeaways

Planning agents create multi-step plans before executing

Chain-of-Thought enables step-by-step reasoning

Task decomposition breaks complex goals into manageable steps

Plan-and-Execute pattern separates planning from execution

Replanning allows adaptation when plans fail

Use planning for complex, multi-step tasks with dependencies

Next Steps

You now understand planning agents! Next, we’ll explore memory systems that allow agents to remember and learn from past interactions.

Memory Systems

Why Agents Need Memory

Without memory, agents are like people with amnesia—they can’t learn from experience, maintain context, or build on previous interactions.

Without Memory:

User: "My name is Alice"
Agent: "Nice to meet you!"
[Later]
User: "What's my name?"
Agent: "I don't know your name."

With Memory:

User: "My name is Alice"
Agent: "Nice to meet you, Alice!" [stores: user_name = "Alice"]
[Later]
User: "What's my name?"
Agent: "Your name is Alice." [retrieves: user_name]

Types of Memory

Short-Term Memory (Working Memory)

Temporary storage for the current task.

Characteristics:

Limited capacity (context window)
Cleared after task completion
Fast access
Stored in conversation history

What to store:

Current conversation
Intermediate results
Active plan
Tool outputs

Long-Term Memory (Persistent Memory)

Permanent storage across sessions.

Characteristics:

Unlimited capacity (database)
Persists across sessions
Slower access (requires retrieval)
Stored in external systems

What to store:

User preferences
Past conversations
Learned facts
Successful strategies

Conversation History Management

Managing the conversation context efficiently.

Basic History Tracking

class ConversationMemory:
    """Simple conversation history"""
    
    def __init__(self, max_messages=20):
        self.messages = []
        self.max_messages = max_messages
    
    def add_message(self, role: str, content: str):
        """Add message to history"""
        self.messages.append({
            "role": role,
            "content": content,
            "timestamp": time.time()
        })
        
        # Trim if too long
        if len(self.messages) > self.max_messages:
            self.messages = self.messages[-self.max_messages:]
    
    def get_messages(self) -> List[dict]:
        """Get conversation history"""
        return self.messages
    
    def clear(self):
        """Clear history"""
        self.messages = []

Sliding Window

Keep only recent messages:

class SlidingWindowMemory:
    """Keep last N messages"""
    
    def __init__(self, window_size=10):
        self.window_size = window_size
        self.messages = []
    
    def add(self, message: dict):
        """Add message and maintain window"""
        self.messages.append(message)
        
        # Keep only last N messages
        if len(self.messages) > self.window_size:
            self.messages = self.messages[-self.window_size:]
    
    def get_context(self) -> List[dict]:
        """Get current window"""
        return self.messages

Token-Based Truncation

Manage by token count instead of message count:

import tiktoken

class TokenAwareMemory:
    """Manage memory by token budget"""
    
    def __init__(self, max_tokens=4000, model="gpt-4"):
        self.max_tokens = max_tokens
        self.messages = []
        self.encoding = tiktoken.encoding_for_model(model)
    
    def count_tokens(self, text: str) -> int:
        """Count tokens in text"""
        return len(self.encoding.encode(text))
    
    def get_total_tokens(self) -> int:
        """Count total tokens in history"""
        total = 0
        for msg in self.messages:
            total += self.count_tokens(msg["content"])
        return total
    
    def add(self, message: dict):
        """Add message and trim if needed"""
        self.messages.append(message)
        
        # Trim oldest messages if over budget
        while self.get_total_tokens() > self.max_tokens and len(self.messages) > 1:
            self.messages.pop(0)  # Remove oldest
    
    def get_context(self) -> List[dict]:
        """Get messages within token budget"""
        return self.messages

Summarization Strategy

Compress old messages:

class SummarizingMemory:
    """Summarize old conversations"""
    
    def __init__(self, summary_threshold=20):
        self.messages = []
        self.summary = None
        self.summary_threshold = summary_threshold
    
    def add(self, message: dict):
        """Add message and summarize if needed"""
        self.messages.append(message)
        
        if len(self.messages) > self.summary_threshold:
            self.summarize_old_messages()
    
    def summarize_old_messages(self):
        """Summarize and compress old messages"""
        # Take first half of messages
        to_summarize = self.messages[:len(self.messages)//2]
        
        # Create summary
        summary_text = self.create_summary(to_summarize)
        
        # Update summary
        if self.summary:
            self.summary += f"\n\n{summary_text}"
        else:
            self.summary = summary_text
        
        # Keep only recent messages
        self.messages = self.messages[len(self.messages)//2:]
    
    def create_summary(self, messages: List[dict]) -> str:
        """Generate summary of messages"""
        conversation = "\n".join([
            f"{m['role']}: {m['content']}" for m in messages
        ])
        
        prompt = f"""Summarize this conversation concisely:

{conversation}

Summary:"""
        
        return llm.generate(prompt)
    
    def get_context(self) -> List[dict]:
        """Get context with summary"""
        context = []
        
        if self.summary:
            context.append({
                "role": "system",
                "content": f"Previous conversation summary:\n{self.summary}"
            })
        
        context.extend(self.messages)
        return context

Vector Databases for Semantic Memory

Store and retrieve information by meaning, not just keywords.

Why Vector Databases?

Traditional search: “Find messages containing ‘Python’” Semantic search: “Find messages about programming languages”

Basic Vector Memory

import numpy as np
from typing import List, Tuple

class VectorMemory:
    """Simple vector-based memory"""
    
    def __init__(self):
        self.memories = []
        self.embeddings = []
    
    def add(self, text: str, metadata: dict = None):
        """Store memory with embedding"""
        # Get embedding
        embedding = self.get_embedding(text)
        
        self.memories.append({
            "text": text,
            "metadata": metadata or {},
            "timestamp": time.time()
        })
        self.embeddings.append(embedding)
    
    def get_embedding(self, text: str) -> np.ndarray:
        """Get embedding for text"""
        # Using OpenAI embeddings
        response = openai.embeddings.create(
            model="text-embedding-3-small",
            input=text
        )
        return np.array(response.data[0].embedding)
    
    def search(self, query: str, top_k: int = 5) -> List[dict]:
        """Search for relevant memories"""
        if not self.memories:
            return []
        
        # Get query embedding
        query_embedding = self.get_embedding(query)
        
        # Calculate similarities
        similarities = []
        for i, emb in enumerate(self.embeddings):
            similarity = self.cosine_similarity(query_embedding, emb)
            similarities.append((i, similarity))
        
        # Sort by similarity
        similarities.sort(key=lambda x: x[1], reverse=True)
        
        # Return top k
        results = []
        for i, score in similarities[:top_k]:
            result = self.memories[i].copy()
            result["similarity"] = score
            results.append(result)
        
        return results
    
    def cosine_similarity(self, a: np.ndarray, b: np.ndarray) -> float:
        """Calculate cosine similarity"""
        return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

Using Chroma

import chromadb
from chromadb.config import Settings

class ChromaMemory:
    """Memory using ChromaDB"""
    
    def __init__(self, collection_name="agent_memory"):
        self.client = chromadb.Client(Settings(
            chroma_db_impl="duckdb+parquet",
            persist_directory="./chroma_db"
        ))
        
        self.collection = self.client.get_or_create_collection(
            name=collection_name,
            metadata={"description": "Agent memory storage"}
        )
    
    def add(self, text: str, metadata: dict = None):
        """Add memory"""
        doc_id = f"mem_{int(time.time() * 1000)}"
        
        self.collection.add(
            documents=[text],
            metadatas=[metadata or {}],
            ids=[doc_id]
        )
    
    def search(self, query: str, n_results: int = 5) -> List[dict]:
        """Search memories"""
        results = self.collection.query(
            query_texts=[query],
            n_results=n_results
        )
        
        memories = []
        for i in range(len(results['documents'][0])):
            memories.append({
                "text": results['documents'][0][i],
                "metadata": results['metadatas'][0][i],
                "distance": results['distances'][0][i]
            })
        
        return memories
    
    def delete_all(self):
        """Clear all memories"""
        self.client.delete_collection(self.collection.name)

Using Pinecone

import pinecone

class PineconeMemory:
    """Memory using Pinecone"""
    
    def __init__(self, index_name="agent-memory"):
        pinecone.init(
            api_key=os.getenv("PINECONE_API_KEY"),
            environment=os.getenv("PINECONE_ENV")
        )
        
        # Create index if doesn't exist
        if index_name not in pinecone.list_indexes():
            pinecone.create_index(
                name=index_name,
                dimension=1536,  # OpenAI embedding size
                metric="cosine"
            )
        
        self.index = pinecone.Index(index_name)
    
    def add(self, text: str, metadata: dict = None):
        """Add memory"""
        # Get embedding
        embedding = self.get_embedding(text)
        
        # Generate ID
        doc_id = f"mem_{int(time.time() * 1000)}"
        
        # Upsert to Pinecone
        self.index.upsert([(
            doc_id,
            embedding,
            {
                "text": text,
                **(metadata or {})
            }
        )])
    
    def search(self, query: str, top_k: int = 5) -> List[dict]:
        """Search memories"""
        query_embedding = self.get_embedding(query)
        
        results = self.index.query(
            vector=query_embedding,
            top_k=top_k,
            include_metadata=True
        )
        
        memories = []
        for match in results['matches']:
            memories.append({
                "text": match['metadata']['text'],
                "score": match['score'],
                "metadata": match['metadata']
            })
        
        return memories

Entity Tracking and State Management

Track entities (people, places, things) mentioned in conversations.

Entity Extraction

class EntityTracker:
    """Track entities across conversation"""
    
    def __init__(self):
        self.entities = {}
    
    def extract_entities(self, text: str) -> dict:
        """Extract entities from text"""
        prompt = f"""Extract entities from this text:

Text: {text}

Return as JSON:
{{
  "people": ["name1", "name2"],
  "places": ["place1"],
  "organizations": ["org1"],
  "dates": ["date1"],
  "other": ["thing1"]
}}"""
        
        response = llm.generate(prompt)
        return json.loads(response)
    
    def update(self, text: str):
        """Update entity tracking"""
        entities = self.extract_entities(text)
        
        for entity_type, items in entities.items():
            if entity_type not in self.entities:
                self.entities[entity_type] = {}
            
            for item in items:
                if item not in self.entities[entity_type]:
                    self.entities[entity_type][item] = {
                        "first_seen": time.time(),
                        "mentions": 0,
                        "context": []
                    }
                
                self.entities[entity_type][item]["mentions"] += 1
                self.entities[entity_type][item]["context"].append(text)
    
    def get_entity_info(self, entity: str) -> dict:
        """Get information about an entity"""
        for entity_type, items in self.entities.items():
            if entity in items:
                return {
                    "type": entity_type,
                    **items[entity]
                }
        return None

State Management

class StateManager:
    """Manage agent state"""
    
    def __init__(self):
        self.state = {
            "user_info": {},
            "current_task": None,
            "preferences": {},
            "context": {}
        }
    
    def update(self, key: str, value: any):
        """Update state"""
        keys = key.split('.')
        current = self.state
        
        for k in keys[:-1]:
            if k not in current:
                current[k] = {}
            current = current[k]
        
        current[keys[-1]] = value
    
    def get(self, key: str, default=None):
        """Get state value"""
        keys = key.split('.')
        current = self.state
        
        for k in keys:
            if k not in current:
                return default
            current = current[k]
        
        return current
    
    def save(self, filepath: str):
        """Save state to file"""
        with open(filepath, 'w') as f:
            json.dump(self.state, f, indent=2)
    
    def load(self, filepath: str):
        """Load state from file"""
        with open(filepath, 'r') as f:
            self.state = json.load(f)

Memory Retrieval Strategies

How to find relevant memories efficiently.

Recency-Based Retrieval

def get_recent_memories(memories: List[dict], n: int = 5) -> List[dict]:
    """Get most recent memories"""
    sorted_memories = sorted(
        memories,
        key=lambda x: x.get('timestamp', 0),
        reverse=True
    )
    return sorted_memories[:n]

Relevance-Based Retrieval

def get_relevant_memories(
    query: str,
    memories: List[dict],
    n: int = 5
) -> List[dict]:
    """Get most relevant memories using embeddings"""
    query_embedding = get_embedding(query)
    
    scored_memories = []
    for memory in memories:
        memory_embedding = memory.get('embedding')
        if memory_embedding:
            score = cosine_similarity(query_embedding, memory_embedding)
            scored_memories.append((memory, score))
    
    scored_memories.sort(key=lambda x: x[1], reverse=True)
    return [m for m, s in scored_memories[:n]]

Hybrid Retrieval

Combine multiple factors:

def hybrid_retrieval(
    query: str,
    memories: List[dict],
    n: int = 5,
    recency_weight: float = 0.3,
    relevance_weight: float = 0.7
) -> List[dict]:
    """Combine recency and relevance"""
    
    query_embedding = get_embedding(query)
    current_time = time.time()
    
    scored_memories = []
    for memory in memories:
        # Relevance score
        relevance = cosine_similarity(
            query_embedding,
            memory['embedding']
        )
        
        # Recency score (decay over time)
        age = current_time - memory['timestamp']
        recency = np.exp(-age / (24 * 3600))  # Decay over days
        
        # Combined score
        score = (
            relevance_weight * relevance +
            recency_weight * recency
        )
        
        scored_memories.append((memory, score))
    
    scored_memories.sort(key=lambda x: x[1], reverse=True)
    return [m for m, s in scored_memories[:n]]

Importance-Based Retrieval

def get_important_memories(
    memories: List[dict],
    n: int = 5
) -> List[dict]:
    """Get memories marked as important"""
    
    # Score by importance
    scored = []
    for memory in memories:
        importance = memory.get('importance', 0)
        scored.append((memory, importance))
    
    scored.sort(key=lambda x: x[1], reverse=True)
    return [m for m, s in scored[:n]]

def calculate_importance(memory: dict) -> float:
    """Calculate memory importance"""
    prompt = f"""Rate the importance of remembering this information (0-10):

{memory['text']}

Consider:
- Is it about user preferences?
- Is it a key fact?
- Will it be useful later?

Importance (0-10):"""
    
    response = llm.generate(prompt)
    return float(response.strip())

Complete Memory System

class ComprehensiveMemory:
    """Full-featured memory system"""
    
    def __init__(self):
        # Short-term memory
        self.conversation = TokenAwareMemory(max_tokens=4000)
        
        # Long-term memory
        self.long_term = ChromaMemory()
        
        # Entity tracking
        self.entities = EntityTracker()
        
        # State management
        self.state = StateManager()
    
    def add_message(self, role: str, content: str):
        """Add message to conversation"""
        message = {
            "role": role,
            "content": content,
            "timestamp": time.time()
        }
        
        # Add to short-term
        self.conversation.add(message)
        
        # Extract and track entities
        if role == "user":
            self.entities.update(content)
        
        # Store important messages in long-term
        if self.is_important(content):
            self.long_term.add(
                content,
                metadata={
                    "role": role,
                    "timestamp": time.time()
                }
            )
    
    def is_important(self, text: str) -> bool:
        """Determine if message should be stored long-term"""
        keywords = [
            "my name is", "i prefer", "remember",
            "always", "never", "i like", "i don't like"
        ]
        return any(kw in text.lower() for kw in keywords)
    
    def get_context(self, query: str = None) -> List[dict]:
        """Get relevant context for current query"""
        context = []
        
        # Add relevant long-term memories
        if query:
            relevant = self.long_term.search(query, n_results=3)
            if relevant:
                context.append({
                    "role": "system",
                    "content": "Relevant information from past:\n" +
                               "\n".join([m['text'] for m in relevant])
                })
        
        # Add recent conversation
        context.extend(self.conversation.get_context())
        
        return context
    
    def save(self, filepath: str):
        """Save memory state"""
        data = {
            "entities": self.entities.entities,
            "state": self.state.state,
            "timestamp": time.time()
        }
        
        with open(filepath, 'w') as f:
            json.dump(data, f, indent=2)
    
    def load(self, filepath: str):
        """Load memory state"""
        with open(filepath, 'r') as f:
            data = json.load(f)
        
        self.entities.entities = data.get('entities', {})
        self.state.state = data.get('state', {})

Using Memory in Agents

class MemoryAgent:
    """Agent with comprehensive memory"""
    
    def __init__(self):
        self.memory = ComprehensiveMemory()
        self.client = openai.OpenAI()
    
    def chat(self, user_input: str) -> str:
        """Chat with memory"""
        
        # Add user message to memory
        self.memory.add_message("user", user_input)
        
        # Get context with relevant memories
        context = self.memory.get_context(query=user_input)
        
        # Generate response
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=context
        )
        
        assistant_message = response.choices[0].message.content
        
        # Add assistant response to memory
        self.memory.add_message("assistant", assistant_message)
        
        return assistant_message
    
    def save_session(self):
        """Save memory for later"""
        self.memory.save("session_memory.json")
    
    def load_session(self):
        """Load previous session"""
        self.memory.load("session_memory.json")

Best Practices

Separate short and long-term: Different storage for different needs
Be selective: Don’t store everything
Use semantic search: Find by meaning, not keywords
Track importance: Prioritize valuable information
Manage token budgets: Don’t overflow context
Summarize old conversations: Compress history
Update entities: Track what’s mentioned
Persist critical data: Save to disk/database
Retrieve strategically: Balance recency, relevance, importance
Test retrieval: Ensure you find what you need

Next Steps

With memory systems in place, agents can maintain context and learn from experience. Next, we’ll explore multi-agent systems where multiple agents collaborate!

Multi-Agent Systems

Why Multiple Agents?

Single agents have limitations. Multiple specialized agents working together can:

Handle complex tasks requiring diverse expertise
Work in parallel for faster completion
Provide checks and balances
Scale better than monolithic agents

graph LR
    subgraph "Single Agent"
    S[Task] --> SA[Agent] --> SR[Result]
    end
    
    subgraph "Multi-Agent System"
    M[Task] --> MA1[Designer]
    M --> MA2[Developer]
    M --> MA3[Tester]
    MA1 --> MC[Coordinator]
    MA2 --> MC
    MA3 --> MC
    MC --> MR[Result]
    end
    
    style SA fill:#dbeafe
    style MA1 fill:#d1fae5
    style MA2 fill:#d1fae5
    style MA3 fill:#d1fae5
    style MC fill:#fef3c7

Example: Building a website

Designer Agent: Creates UI/UX mockups
Developer Agent: Writes code
Tester Agent: Finds bugs
Reviewer Agent: Ensures quality

💡 When to Use Multi-Agent Systems

Use multiple agents when:

Task requires diverse expertise

Parallel processing is beneficial

Checks and balances are needed

Scaling beyond single agent capacity

Stick with single agent when:

Task is simple and focused

Coordination overhead isn’t worth it

Real-time response is critical

Agent Collaboration Patterns

1. Sequential (Pipeline)

Agents work one after another:

Agent A → Agent B → Agent C → Result

class SequentialAgents:
    """Agents work in sequence"""
    
    def __init__(self, agents: List):
        self.agents = agents
    
    def run(self, task: str) -> str:
        """Execute agents sequentially"""
        result = task
        
        for agent in self.agents:
            print(f"→ {agent.name} processing...")
            result = agent.process(result)
        
        return result

# Example
pipeline = SequentialAgents([
    ResearchAgent(),
    AnalysisAgent(),
    WriterAgent()
])

result = pipeline.run("Write a report on AI trends")
# Research → Analysis → Writing

2. Parallel (Concurrent)

Agents work simultaneously:

        ┌─ Agent A ─┐
Task ───┼─ Agent B ─┼─→ Combine → Result
        └─ Agent C ─┘

import asyncio

class ParallelAgents:
    """Agents work in parallel"""
    
    def __init__(self, agents: List):
        self.agents = agents
    
    async def run(self, task: str) -> str:
        """Execute agents in parallel"""
        # Run all agents concurrently
        tasks = [agent.process_async(task) for agent in self.agents]
        results = await asyncio.gather(*tasks)
        
        # Combine results
        return self.combine_results(results)
    
    def combine_results(self, results: List[str]) -> str:
        """Merge results from multiple agents"""
        prompt = f"""Combine these results into a coherent response:

{chr(10).join([f"Agent {i+1}: {r}" for i, r in enumerate(results)])}

Combined result:"""
        
        return llm.generate(prompt)

# Example
parallel = ParallelAgents([
    SearchAgent(),
    DatabaseAgent(),
    APIAgent()
])

result = await parallel.run("Find information about user X")
# All agents search simultaneously

3. Hierarchical (Manager-Worker)

Manager delegates to workers:

      Manager
     /   |   \
Worker1 Worker2 Worker3

class ManagerAgent:
    """Manages and delegates to worker agents"""
    
    def __init__(self, workers: List):
        self.workers = workers
    
    def run(self, task: str) -> str:
        """Delegate and coordinate"""
        # Break down task
        subtasks = self.decompose_task(task)
        
        # Assign to workers
        assignments = self.assign_tasks(subtasks)
        
        # Collect results
        results = []
        for worker, subtask in assignments:
            result = worker.execute(subtask)
            results.append(result)
        
        # Synthesize final result
        return self.synthesize(results)
    
    def decompose_task(self, task: str) -> List[str]:
        """Break task into subtasks"""
        prompt = f"""Break this task into 3-5 subtasks:

Task: {task}

Subtasks:"""
        
        response = llm.generate(prompt)
        return self.parse_subtasks(response)
    
    def assign_tasks(self, subtasks: List[str]) -> List[tuple]:
        """Assign subtasks to workers"""
        assignments = []
        
        for i, subtask in enumerate(subtasks):
            # Round-robin assignment
            worker = self.workers[i % len(self.workers)]
            assignments.append((worker, subtask))
        
        return assignments

4. Debate (Adversarial)

Agents debate to reach better conclusions:

class DebateSystem:
    """Agents debate to find best answer"""
    
    def __init__(self, agents: List, rounds: int = 3):
        self.agents = agents
        self.rounds = rounds
    
    def run(self, question: str) -> str:
        """Run debate"""
        positions = []
        
        # Initial positions
        for agent in self.agents:
            position = agent.initial_position(question)
            positions.append(position)
        
        # Debate rounds
        for round_num in range(self.rounds):
            print(f"\n--- Round {round_num + 1} ---")
            
            new_positions = []
            for i, agent in enumerate(self.agents):
                # Show other positions
                other_positions = [p for j, p in enumerate(positions) if j != i]
                
                # Agent responds
                response = agent.respond(question, other_positions)
                new_positions.append(response)
                print(f"{agent.name}: {response[:100]}...")
            
            positions = new_positions
        
        # Judge decides winner
        return self.judge(question, positions)
    
    def judge(self, question: str, positions: List[str]) -> str:
        """Determine best answer"""
        prompt = f"""Question: {question}

Positions:
{chr(10).join([f"{i+1}. {p}" for i, p in enumerate(positions)])}

Which position is most convincing and why?"""
        
        return llm.generate(prompt)

5. Collaborative (Peer-to-Peer)

Agents work together as equals:

class CollaborativeAgents:
    """Agents collaborate as peers"""
    
    def __init__(self, agents: List):
        self.agents = agents
        self.shared_context = {}
    
    def run(self, task: str) -> str:
        """Collaborative execution"""
        self.shared_context['task'] = task
        self.shared_context['contributions'] = []
        
        # Each agent contributes
        for agent in self.agents:
            contribution = agent.contribute(self.shared_context)
            self.shared_context['contributions'].append({
                'agent': agent.name,
                'content': contribution
            })
            
            # Other agents can see and build on this
            print(f"✓ {agent.name} contributed")
        
        # Synthesize all contributions
        return self.synthesize_contributions()
    
    def synthesize_contributions(self) -> str:
        """Combine all contributions"""
        contributions = self.shared_context['contributions']
        
        prompt = f"""Synthesize these contributions into a final result:

Task: {self.shared_context['task']}

Contributions:
{chr(10).join([f"- {c['agent']}: {c['content']}" for c in contributions])}

Final result:"""
        
        return llm.generate(prompt)

Delegation and Orchestration

Simple Orchestrator

class Orchestrator:
    """Coordinates multiple agents"""
    
    def __init__(self):
        self.agents = {}
    
    def register_agent(self, name: str, agent):
        """Register an agent"""
        self.agents[name] = agent
    
    def delegate(self, task: str) -> str:
        """Delegate task to appropriate agent"""
        # Determine which agent should handle this
        agent_name = self.select_agent(task)
        
        if agent_name not in self.agents:
            return f"No agent available for: {task}"
        
        # Delegate to agent
        agent = self.agents[agent_name]
        return agent.execute(task)
    
    def select_agent(self, task: str) -> str:
        """Select best agent for task"""
        prompt = f"""Which agent should handle this task?

Task: {task}

Available agents:
{chr(10).join([f"- {name}: {agent.description}" for name, agent in self.agents.items()])}

Best agent:"""
        
        response = llm.generate(prompt)
        return response.strip()

Advanced Orchestrator with Routing

class SmartOrchestrator:
    """Intelligent task routing"""
    
    def __init__(self):
        self.agents = {}
        self.routing_history = []
    
    def register_agent(self, name: str, agent, capabilities: List[str]):
        """Register agent with capabilities"""
        self.agents[name] = {
            'agent': agent,
            'capabilities': capabilities,
            'success_rate': 1.0
        }
    
    def route_task(self, task: str) -> str:
        """Route task to best agent"""
        # Score each agent
        scores = {}
        for name, info in self.agents.items():
            score = self.score_agent(task, info)
            scores[name] = score
        
        # Select best agent
        best_agent = max(scores, key=scores.get)
        
        # Execute
        result = self.agents[best_agent]['agent'].execute(task)
        
        # Update success rate
        self.update_success_rate(best_agent, result)
        
        return result
    
    def score_agent(self, task: str, agent_info: dict) -> float:
        """Score agent suitability"""
        # Check capability match
        capability_score = self.match_capabilities(task, agent_info['capabilities'])
        
        # Consider past success
        success_score = agent_info['success_rate']
        
        # Combined score
        return 0.7 * capability_score + 0.3 * success_score

Consensus and Voting Mechanisms

Simple Voting

class VotingSystem:
    """Agents vote on decisions"""
    
    def __init__(self, agents: List):
        self.agents = agents
    
    def decide(self, question: str, options: List[str]) -> str:
        """Agents vote on options"""
        votes = {}
        
        for agent in self.agents:
            vote = agent.vote(question, options)
            votes[vote] = votes.get(vote, 0) + 1
        
        # Return option with most votes
        winner = max(votes, key=votes.get)
        return winner

# Example
voters = VotingSystem([
    Agent1(), Agent2(), Agent3()
])

decision = voters.decide(
    "Which framework should we use?",
    ["React", "Vue", "Angular"]
)

Weighted Voting

class WeightedVoting:
    """Agents vote with different weights"""
    
    def __init__(self, agents: List[tuple]):
        # agents = [(agent, weight), ...]
        self.agents = agents
    
    def decide(self, question: str, options: List[str]) -> str:
        """Weighted voting"""
        scores = {option: 0.0 for option in options}
        
        for agent, weight in self.agents:
            vote = agent.vote(question, options)
            scores[vote] += weight
        
        return max(scores, key=scores.get)

# Example
weighted = WeightedVoting([
    (ExpertAgent(), 2.0),    # Expert has 2x weight
    (JuniorAgent(), 1.0),
    (JuniorAgent(), 1.0)
])

Consensus Building

class ConsensusBuilder:
    """Build consensus among agents"""
    
    def __init__(self, agents: List, threshold: float = 0.8):
        self.agents = agents
        self.threshold = threshold
    
    def reach_consensus(self, question: str, max_rounds: int = 5) -> str:
        """Iteratively build consensus"""
        
        for round_num in range(max_rounds):
            # Get opinions
            opinions = [agent.opinion(question) for agent in self.agents]
            
            # Check agreement
            agreement = self.measure_agreement(opinions)
            
            if agreement >= self.threshold:
                return self.synthesize_consensus(opinions)
            
            # Share opinions and iterate
            for agent in self.agents:
                agent.see_opinions(opinions)
        
        return "No consensus reached"
    
    def measure_agreement(self, opinions: List[str]) -> float:
        """Measure how much agents agree"""
        # Use embeddings to measure similarity
        embeddings = [get_embedding(op) for op in opinions]
        
        # Calculate pairwise similarities
        similarities = []
        for i in range(len(embeddings)):
            for j in range(i+1, len(embeddings)):
                sim = cosine_similarity(embeddings[i], embeddings[j])
                similarities.append(sim)
        
        return np.mean(similarities)

Communication Protocols

Message Passing

class MessageBus:
    """Central message bus for agent communication"""
    
    def __init__(self):
        self.subscribers = {}
        self.messages = []
    
    def subscribe(self, agent_id: str, topics: List[str]):
        """Agent subscribes to topics"""
        for topic in topics:
            if topic not in self.subscribers:
                self.subscribers[topic] = []
            self.subscribers[topic].append(agent_id)
    
    def publish(self, topic: str, message: dict):
        """Publish message to topic"""
        self.messages.append({
            'topic': topic,
            'message': message,
            'timestamp': time.time()
        })
        
        # Notify subscribers
        if topic in self.subscribers:
            for agent_id in self.subscribers[topic]:
                self.deliver(agent_id, message)
    
    def deliver(self, agent_id: str, message: dict):
        """Deliver message to agent"""
        # Implementation depends on agent architecture
        pass

Direct Communication

class Agent:
    """Agent with communication capabilities"""
    
    def __init__(self, name: str):
        self.name = name
        self.inbox = []
        self.peers = {}
    
    def send_message(self, recipient: str, message: str):
        """Send message to another agent"""
        if recipient in self.peers:
            self.peers[recipient].receive_message(self.name, message)
    
    def receive_message(self, sender: str, message: str):
        """Receive message from another agent"""
        self.inbox.append({
            'from': sender,
            'message': message,
            'timestamp': time.time()
        })
    
    def broadcast(self, message: str):
        """Send message to all peers"""
        for peer_name, peer in self.peers.items():
            peer.receive_message(self.name, message)
    
    def add_peer(self, name: str, agent):
        """Add peer agent"""
        self.peers[name] = agent

Complete Multi-Agent System

class MultiAgentSystem:
    """Complete multi-agent system"""
    
    def __init__(self):
        self.agents = {}
        self.message_bus = MessageBus()
        self.orchestrator = Orchestrator()
    
    def add_agent(self, name: str, agent, role: str):
        """Add agent to system"""
        self.agents[name] = {
            'agent': agent,
            'role': role,
            'status': 'idle'
        }
        self.orchestrator.register_agent(name, agent)
    
    def execute_task(self, task: str, strategy: str = 'auto') -> str:
        """Execute task using appropriate strategy"""
        
        if strategy == 'sequential':
            return self.execute_sequential(task)
        elif strategy == 'parallel':
            return self.execute_parallel(task)
        elif strategy == 'hierarchical':
            return self.execute_hierarchical(task)
        else:
            return self.execute_auto(task)
    
    def execute_sequential(self, task: str) -> str:
        """Sequential execution"""
        result = task
        
        for name, info in self.agents.items():
            agent = info['agent']
            result = agent.process(result)
        
        return result
    
    async def execute_parallel(self, task: str) -> str:
        """Parallel execution"""
        tasks = []
        
        for name, info in self.agents.items():
            agent = info['agent']
            tasks.append(agent.process_async(task))
        
        results = await asyncio.gather(*tasks)
        return self.combine_results(results)
    
    def execute_hierarchical(self, task: str) -> str:
        """Hierarchical execution with manager"""
        # Find manager agent
        manager = self.find_manager()
        
        if not manager:
            return "No manager agent available"
        
        # Manager coordinates workers
        return manager.coordinate(task, self.agents)
    
    def execute_auto(self, task: str) -> str:
        """Automatically choose best strategy"""
        # Analyze task complexity
        complexity = self.analyze_task(task)
        
        if complexity['parallel_potential'] > 0.7:
            return asyncio.run(self.execute_parallel(task))
        elif complexity['requires_coordination']:
            return self.execute_hierarchical(task)
        else:
            return self.execute_sequential(task)

Example: Research Team

class ResearchTeam:
    """Multi-agent research team"""
    
    def __init__(self):
        self.researcher = ResearchAgent()
        self.analyst = AnalystAgent()
        self.writer = WriterAgent()
        self.reviewer = ReviewerAgent()
    
    def research_topic(self, topic: str) -> str:
        """Collaborative research"""
        
        # 1. Researcher gathers information
        print("📚 Researcher gathering information...")
        raw_data = self.researcher.gather(topic)
        
        # 2. Analyst analyzes data
        print("📊 Analyst analyzing data...")
        analysis = self.analyst.analyze(raw_data)
        
        # 3. Writer creates report
        print("✍️  Writer creating report...")
        draft = self.writer.write(analysis)
        
        # 4. Reviewer provides feedback
        print("👀 Reviewer checking quality...")
        feedback = self.reviewer.review(draft)
        
        # 5. Writer revises based on feedback
        if feedback['needs_revision']:
            print("🔄 Writer revising...")
            final = self.writer.revise(draft, feedback)
        else:
            final = draft
        
        return final

# Usage
team = ResearchTeam()
report = team.research_topic("AI Agent Architectures")

Best Practices

Clear roles: Each agent should have a specific purpose
Defined interfaces: Standardize communication
Avoid bottlenecks: Don’t make everything go through one agent
Handle failures: One agent failing shouldn’t crash the system
Monitor coordination: Track how agents interact
Balance autonomy: Agents should be independent but coordinated
Prevent conflicts: Resolve disagreements systematically
Scale gradually: Start simple, add complexity as needed
Test interactions: Verify agents work well together
Document protocols: Clear communication standards

Common Pitfalls

Pitfall 1: Over-coordination

Problem: Too much communication overhead Solution: Let agents work independently when possible

Pitfall 2: Conflicting Goals

Problem: Agents work against each other Solution: Align objectives and add conflict resolution

Pitfall 3: Infinite Loops

Problem: Agents keep delegating to each other Solution: Add delegation limits and cycle detection

Pitfall 4: No Clear Owner

Problem: Task falls through the cracks Solution: Always assign clear responsibility

Practice Exercises

Exercise 1: Build a Debate System (Medium)

Task: Create 3 agents that debate a topic and reach consensus.

Requirements:

Each agent takes a position
Agents respond to each other’s arguments
Judge determines the winner

Click to see solution

class DebateAgent:
    def __init__(self, position: str):
        self.position = position
        self.client = openai.OpenAI()
    
    def argue(self, topic: str, opponent_args: List[str]) -> str:
        prompt = f"Topic: {topic}\nYour position: {self.position}\nOpponent arguments: {opponent_args}\n\nYour argument:"
        
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}]
        )
        return response.choices[0].message.content

# Create debate
agents = [
    DebateAgent("for"),
    DebateAgent("against"),
    DebateAgent("neutral")
]

# Run debate rounds
for round in range(3):
    for agent in agents:
        others = [a.argue(topic, []) for a in agents if a != agent]
        agent.argue(topic, others)

Exercise 2: Parallel Task Execution (Hard)

Task: Create a system where 4 agents analyze different files simultaneously.

Requirements:

Use asyncio for parallel execution
Aggregate results
Handle failures gracefully

Click to see solution

import asyncio

async def analyze_parallel(files: List[str]) -> List[Dict]:
    tasks = [analyze_file(f) for f in files]
    results = await asyncio.gather(*tasks, return_exceptions=True)
    return [r for r in results if not isinstance(r, Exception)]

async def analyze_file(file_path: str) -> Dict:
    # Simulate analysis
    await asyncio.sleep(1)
    return {"file": file_path, "issues": []}

✅ Chapter 3 Summary

You’ve mastered advanced agent patterns:

Planning: Create multi-step plans with Chain-of-Thought and task decomposition

Memory: Implement short-term, long-term, and semantic memory systems

Multi-Agent: Coordinate specialized agents with various collaboration patterns

These patterns enable agents to handle complex, long-running tasks that require coordination, context, and diverse expertise.

Next Steps

You now understand multi-agent systems! In Chapter 4, we’ll explore the tools and capabilities that make agents powerful, including code execution, data access, and web interaction.

Code Execution

Module 4: Learning Objectives

By the end of this module, you will:

✓ Execute code safely in sandboxed environments
✓ Integrate data sources (databases, APIs, file systems)
✓ Implement web scraping and browser automation
✓ Build RAG systems for knowledge retrieval
✓ Handle various data formats and protocols

Why Agents Need Code Execution

Code execution allows agents to:

Perform precise calculations
Process data programmatically
Generate and test code
Automate complex operations
Verify results deterministically

Without code execution: “The sum of 1 to 100 is approximately 5050” With code execution: “The sum of 1 to 100 is exactly 5050” (calculated)

Sandboxed Environments

Never execute untrusted code directly. Always use sandboxing.

Why Sandboxing?

Risks of unsandboxed execution:

File system access (delete files)
Network access (data exfiltration)
System commands (malicious operations)
Resource exhaustion (infinite loops)

Docker Sandbox

import docker
import tempfile

class DockerSandbox:
    """Execute code in Docker container"""
    
    def __init__(self, image="python:3.11-slim"):
        self.client = docker.from_env()
        self.image = image
    
    def execute(self, code: str, timeout: int = 30) -> dict:
        """Execute Python code in container"""
        try:
            # Create temporary file with code
            with tempfile.NamedTemporaryFile(mode='w', suffix='.py', delete=False) as f:
                f.write(code)
                code_file = f.name
            
            # Run container
            container = self.client.containers.run(
                self.image,
                f"python {code_file}",
                detach=True,
                mem_limit="128m",
                network_disabled=True,
                remove=True
            )
            
            # Wait for completion
            result = container.wait(timeout=timeout)
            logs = container.logs().decode('utf-8')
            
            return {
                "success": result['StatusCode'] == 0,
                "output": logs,
                "exit_code": result['StatusCode']
            }
            
        except docker.errors.ContainerError as e:
            return {
                "success": False,
                "output": str(e),
                "exit_code": -1
            }
        except Exception as e:
            return {
                "success": False,
                "output": f"Error: {str(e)}",
                "exit_code": -1
            }

RestrictedPython

from RestrictedPython import compile_restricted, safe_globals
import io
import sys

class RestrictedExecutor:
    """Execute Python with restrictions"""
    
    def __init__(self):
        self.safe_builtins = {
            'print': print,
            'range': range,
            'len': len,
            'sum': sum,
            'max': max,
            'min': min,
            'abs': abs,
            'round': round,
            'sorted': sorted,
            'list': list,
            'dict': dict,
            'set': set,
            'str': str,
            'int': int,
            'float': float,
        }
    
    def execute(self, code: str, timeout: int = 5) -> dict:
        """Execute restricted Python code"""
        try:
            # Compile with restrictions
            byte_code = compile_restricted(
                code,
                filename='<inline>',
                mode='exec'
            )
            
            if byte_code.errors:
                return {
                    "success": False,
                    "output": "\n".join(byte_code.errors)
                }
            
            # Capture output
            output_buffer = io.StringIO()
            sys.stdout = output_buffer
            
            # Execute with safe globals
            exec(byte_code, {
                "__builtins__": self.safe_builtins,
                "_print_": print,
                "_getattr_": getattr,
            })
            
            # Restore stdout
            sys.stdout = sys.__stdout__
            
            return {
                "success": True,
                "output": output_buffer.getvalue()
            }
            
        except Exception as e:
            sys.stdout = sys.__stdout__
            return {
                "success": False,
                "output": f"Error: {str(e)}"
            }

E2B Code Interpreter

from e2b import Sandbox

class E2BSandbox:
    """Execute code using E2B"""
    
    def __init__(self):
        self.sandbox = Sandbox()
    
    def execute_python(self, code: str) -> dict:
        """Execute Python code"""
        try:
            execution = self.sandbox.run_code(code)
            
            return {
                "success": not execution.error,
                "output": execution.stdout,
                "error": execution.stderr,
                "logs": execution.logs
            }
        except Exception as e:
            return {
                "success": False,
                "output": "",
                "error": str(e)
            }
    
    def execute_bash(self, command: str) -> dict:
        """Execute bash command"""
        try:
            result = self.sandbox.process.start_and_wait(command)
            
            return {
                "success": result.exit_code == 0,
                "output": result.stdout,
                "error": result.stderr
            }
        except Exception as e:
            return {
                "success": False,
                "error": str(e)
            }

Code Generation and Validation

Generate Code

def generate_code(task: str, language: str = "python") -> str:
    """Generate code for a task"""
    prompt = f"""Write {language} code to accomplish this task:

Task: {task}

Requirements:
- Include error handling
- Add comments
- Return result clearly
- Keep it simple and readable

Code:"""
    
    response = llm.generate(prompt, temperature=0.2)
    return extract_code(response)

def extract_code(response: str) -> str:
    """Extract code from markdown"""
    import re
    
    # Look for code blocks
    pattern = r"```(?:python)?\n(.*?)```"
    matches = re.findall(pattern, response, re.DOTALL)
    
    if matches:
        return matches[0].strip()
    
    return response.strip()

Validate Code

import ast

def validate_python_code(code: str) -> dict:
    """Validate Python code syntax"""
    try:
        ast.parse(code)
        return {
            "valid": True,
            "errors": []
        }
    except SyntaxError as e:
        return {
            "valid": False,
            "errors": [f"Line {e.lineno}: {e.msg}"]
        }

def check_dangerous_operations(code: str) -> dict:
    """Check for dangerous operations"""
    dangerous_patterns = [
        (r'import\s+os', "OS module import"),
        (r'import\s+sys', "System module import"),
        (r'import\s+subprocess', "Subprocess import"),
        (r'open\s*\(', "File operations"),
        (r'eval\s*\(', "Eval usage"),
        (r'exec\s*\(', "Exec usage"),
        (r'__import__', "Dynamic imports"),
    ]
    
    issues = []
    for pattern, description in dangerous_patterns:
        if re.search(pattern, code):
            issues.append(description)
    
    return {
        "safe": len(issues) == 0,
        "issues": issues
    }

Test Generated Code

def test_code(code: str, test_cases: List[dict]) -> dict:
    """Test code with test cases"""
    sandbox = RestrictedExecutor()
    results = []
    
    for test in test_cases:
        # Prepare test code
        test_code = f"""
{code}

# Test case
result = {test['call']}
print(result)
"""
        
        # Execute
        output = sandbox.execute(test_code)
        
        # Check result
        expected = str(test['expected'])
        actual = output['output'].strip()
        
        results.append({
            "test": test['call'],
            "expected": expected,
            "actual": actual,
            "passed": actual == expected
        })
    
    return {
        "total": len(results),
        "passed": sum(1 for r in results if r['passed']),
        "results": results
    }

# Example usage
code = """
def add(a, b):
    return a + b
"""

test_cases = [
    {"call": "add(2, 3)", "expected": 5},
    {"call": "add(-1, 1)", "expected": 0},
    {"call": "add(0, 0)", "expected": 0}
]

results = test_code(code, test_cases)

Debugging and Error Recovery

Parse Errors

def parse_error(error_message: str) -> dict:
    """Parse error message for useful info"""
    import re
    
    # Extract line number
    line_match = re.search(r'line (\d+)', error_message)
    line_num = int(line_match.group(1)) if line_match else None
    
    # Extract error type
    type_match = re.search(r'(\w+Error):', error_message)
    error_type = type_match.group(1) if type_match else "Unknown"
    
    return {
        "type": error_type,
        "line": line_num,
        "message": error_message
    }

Auto-Fix Errors

def fix_code_error(code: str, error: str) -> str:
    """Attempt to fix code based on error"""
    prompt = f"""This code has an error:

Code:
```python
{code}

Error: {error}

Provide the corrected code:“”“

response = llm.generate(prompt, temperature=0.1)
return extract_code(response)

def iterative_fix(code: str, max_attempts: int = 3) -> dict: “”“Iteratively fix code until it works”“” sandbox = RestrictedExecutor()

for attempt in range(max_attempts):
    # Try to execute
    result = sandbox.execute(code)
    
    if result['success']:
        return {
            "success": True,
            "code": code,
            "attempts": attempt + 1
        }
    
    # Try to fix
    code = fix_code_error(code, result['output'])

return {
    "success": False,
    "code": code,
    "attempts": max_attempts,
    "error": "Max attempts reached"
}


## Security Considerations

### Input Validation

```python
def validate_code_input(code: str) -> dict:
    """Validate code before execution"""
    
    # Check length
    if len(code) > 10000:
        return {
            "valid": False,
            "reason": "Code too long (max 10000 chars)"
        }
    
    # Check for null bytes
    if '\x00' in code:
        return {
            "valid": False,
            "reason": "Invalid characters in code"
        }
    
    # Check syntax
    syntax_check = validate_python_code(code)
    if not syntax_check['valid']:
        return {
            "valid": False,
            "reason": f"Syntax error: {syntax_check['errors']}"
        }
    
    # Check for dangerous operations
    safety_check = check_dangerous_operations(code)
    if not safety_check['safe']:
        return {
            "valid": False,
            "reason": f"Unsafe operations: {safety_check['issues']}"
        }
    
    return {"valid": True}

Resource Limits

class ResourceLimitedExecutor:
    """Execute code with resource limits"""
    
    def __init__(self):
        self.max_execution_time = 30  # seconds
        self.max_memory = 128 * 1024 * 1024  # 128 MB
        self.max_output_size = 10000  # characters
    
    def execute(self, code: str) -> dict:
        """Execute with limits"""
        import signal
        import resource
        
        def timeout_handler(signum, frame):
            raise TimeoutError("Execution timeout")
        
        # Set timeout
        signal.signal(signal.SIGALRM, timeout_handler)
        signal.alarm(self.max_execution_time)
        
        # Set memory limit
        resource.setrlimit(
            resource.RLIMIT_AS,
            (self.max_memory, self.max_memory)
        )
        
        try:
            # Execute code
            result = self._execute_code(code)
            
            # Limit output size
            if len(result['output']) > self.max_output_size:
                result['output'] = result['output'][:self.max_output_size] + "...(truncated)"
            
            return result
            
        except TimeoutError:
            return {
                "success": False,
                "output": "Execution timeout"
            }
        except MemoryError:
            return {
                "success": False,
                "output": "Memory limit exceeded"
            }
        finally:
            signal.alarm(0)  # Cancel alarm

Complete Code Execution Agent

class CodeExecutionAgent:
    """Agent that can generate and execute code"""
    
    def __init__(self):
        self.sandbox = RestrictedExecutor()
        self.client = openai.OpenAI()
    
    def solve_with_code(self, problem: str) -> str:
        """Solve problem by generating and executing code"""
        
        # Generate code
        print("💻 Generating code...")
        code = self.generate_solution(problem)
        print(f"Generated:\n{code}\n")
        
        # Validate
        validation = validate_code_input(code)
        if not validation['valid']:
            return f"Invalid code: {validation['reason']}"
        
        # Execute
        print("▶️  Executing code...")
        result = self.sandbox.execute(code)
        
        if result['success']:
            print(f"✓ Output: {result['output']}\n")
            return self.format_result(problem, code, result['output'])
        else:
            # Try to fix and retry
            print("⚠️  Error occurred, attempting fix...")
            fixed = iterative_fix(code)
            
            if fixed['success']:
                result = self.sandbox.execute(fixed['code'])
                return self.format_result(problem, fixed['code'], result['output'])
            else:
                return f"Failed to execute: {result['output']}"
    
    def generate_solution(self, problem: str) -> str:
        """Generate code to solve problem"""
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[{
                "role": "user",
                "content": f"""Write Python code to solve this problem:

{problem}

Requirements:
- Use only standard library
- Print the final result
- Handle edge cases
- Keep it simple

Provide only the code, no explanations."""
            }],
            temperature=0.2
        )
        
        return extract_code(response.choices[0].message.content)
    
    def format_result(self, problem: str, code: str, output: str) -> str:
        """Format final result"""
        return f"""Problem: {problem}

Solution:
```python
{code}

Result: {output}“”“

Usage

agent = CodeExecutionAgent() result = agent.solve_with_code(“Calculate the sum of all prime numbers less than 100”) print(result)


## Advanced Use Cases

### Data Analysis

```python
def analyze_data_with_code(data: List[dict], question: str) -> str:
    """Analyze data using generated code"""
    
    # Generate analysis code
    code = f"""
import json

data = {json.dumps(data)}

# Analysis code will be generated here
"""
    
    analysis_code = generate_code(
        f"Analyze this data to answer: {question}\nData structure: {data[0] if data else {}}"
    )
    
    full_code = code + "\n" + analysis_code
    
    # Execute
    sandbox = RestrictedExecutor()
    result = sandbox.execute(full_code)
    
    return result['output']

Mathematical Computation

def compute_math(expression: str) -> str:
    """Safely compute mathematical expression"""
    
    code = f"""
import math

result = {expression}
print(result)
"""
    
    sandbox = RestrictedExecutor()
    result = sandbox.execute(code)
    
    if result['success']:
        return result['output'].strip()
    else:
        return f"Error: {result['output']}"

Code Transformation

def transform_code(code: str, transformation: str) -> str:
    """Transform code (refactor, optimize, etc.)"""
    
    prompt = f"""Transform this code:

Original:
```python
{code}

Transformation: {transformation}

Transformed code:“”“

response = llm.generate(prompt)
return extract_code(response)

Example

original = “for i in range(len(items)): print(items[i])” transformed = transform_code(original, “Make it more Pythonic”)

Result: “for item in items: print(item)”


## Best Practices

1. **Always sandbox**: Never execute untrusted code directly
2. **Set timeouts**: Prevent infinite loops
3. **Limit resources**: Memory, CPU, network
4. **Validate inputs**: Check code before execution
5. **Handle errors gracefully**: Don't crash on bad code
6. **Test generated code**: Verify it works
7. **Log executions**: Track what code runs
8. **Isolate environments**: One execution shouldn't affect others
9. **Clean up**: Remove temporary files and containers
10. **Monitor usage**: Track resource consumption

## Common Pitfalls

### Pitfall 1: Trusting Generated Code
**Problem**: LLM generates code with bugs
**Solution**: Always test and validate

### Pitfall 2: No Timeout
**Problem**: Infinite loops hang the system
**Solution**: Set execution timeouts

### Pitfall 3: Unrestricted Access
**Problem**: Code can access file system
**Solution**: Use proper sandboxing

### Pitfall 4: Poor Error Messages
**Problem**: User doesn't understand what went wrong
**Solution**: Parse and explain errors clearly

## Next Steps

You now understand code execution for agents! Next, we'll explore data access and retrieval, including databases, APIs, and RAG systems.

Data Access & Retrieval

RAG (Retrieval Augmented Generation)

RAG combines retrieval with generation to provide accurate, grounded responses.

Why RAG?

Without RAG:

LLM relies on training data (may be outdated)
Can hallucinate facts
No access to private/recent information

With RAG:

Retrieves relevant documents first
Grounds responses in actual data
Works with private knowledge bases
Always up-to-date

Basic RAG Pipeline

class SimpleRAG:
    """Basic RAG implementation"""
    
    def __init__(self):
        self.documents = []
        self.embeddings = []
        self.client = openai.OpenAI()
    
    def add_document(self, text: str, metadata: dict = None):
        """Add document to knowledge base"""
        # Create embedding
        embedding = self.get_embedding(text)
        
        self.documents.append({
            "text": text,
            "metadata": metadata or {},
            "id": len(self.documents)
        })
        self.embeddings.append(embedding)
    
    def get_embedding(self, text: str) -> list:
        """Get embedding for text"""
        response = self.client.embeddings.create(
            model="text-embedding-3-small",
            input=text
        )
        return response.data[0].embedding
    
    def retrieve(self, query: str, top_k: int = 3) -> list:
        """Retrieve relevant documents"""
        # Get query embedding
        query_embedding = self.get_embedding(query)
        
        # Calculate similarities
        similarities = []
        for i, doc_embedding in enumerate(self.embeddings):
            similarity = self.cosine_similarity(query_embedding, doc_embedding)
            similarities.append((i, similarity))
        
        # Sort and get top k
        similarities.sort(key=lambda x: x[1], reverse=True)
        
        results = []
        for i, score in similarities[:top_k]:
            doc = self.documents[i].copy()
            doc['score'] = score
            results.append(doc)
        
        return results
    
    def query(self, question: str) -> str:
        """Answer question using RAG"""
        # Retrieve relevant documents
        docs = self.retrieve(question, top_k=3)
        
        # Build context
        context = "\n\n".join([
            f"Document {i+1}:\n{doc['text']}"
            for i, doc in enumerate(docs)
        ])
        
        # Generate answer
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[
                {
                    "role": "system",
                    "content": "Answer questions based on the provided context. If the answer isn't in the context, say so."
                },
                {
                    "role": "user",
                    "content": f"Context:\n{context}\n\nQuestion: {question}"
                }
            ]
        )
        
        return response.choices[0].message.content
    
    def cosine_similarity(self, a: list, b: list) -> float:
        """Calculate cosine similarity"""
        import numpy as np
        a = np.array(a)
        b = np.array(b)
        return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

# Usage
rag = SimpleRAG()

# Add documents
rag.add_document("Python is a high-level programming language.")
rag.add_document("JavaScript is used for web development.")
rag.add_document("Python is popular for data science and AI.")

# Query
answer = rag.query("What is Python used for?")
print(answer)

Advanced RAG with LangChain

from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains import RetrievalQA
from langchain.llms import OpenAI

class AdvancedRAG:
    """RAG using LangChain"""
    
    def __init__(self, persist_directory="./chroma_db"):
        self.embeddings = OpenAIEmbeddings()
        self.text_splitter = RecursiveCharacterTextSplitter(
            chunk_size=1000,
            chunk_overlap=200
        )
        self.vectorstore = None
        self.persist_directory = persist_directory
    
    def load_documents(self, documents: list):
        """Load and process documents"""
        # Split documents into chunks
        chunks = self.text_splitter.create_documents(documents)
        
        # Create vector store
        self.vectorstore = Chroma.from_documents(
            documents=chunks,
            embedding=self.embeddings,
            persist_directory=self.persist_directory
        )
    
    def query(self, question: str) -> dict:
        """Query with source attribution"""
        if not self.vectorstore:
            return {"answer": "No documents loaded", "sources": []}
        
        # Create QA chain
        qa_chain = RetrievalQA.from_chain_type(
            llm=OpenAI(temperature=0),
            chain_type="stuff",
            retriever=self.vectorstore.as_retriever(search_kwargs={"k": 3}),
            return_source_documents=True
        )
        
        # Query
        result = qa_chain({"query": question})
        
        return {
            "answer": result["result"],
            "sources": [doc.page_content for doc in result["source_documents"]]
        }

Chunking Strategies

class DocumentChunker:
    """Different chunking strategies"""
    
    def chunk_by_tokens(self, text: str, chunk_size: int = 512, overlap: int = 50) -> list:
        """Chunk by token count"""
        import tiktoken
        
        encoding = tiktoken.encoding_for_model("gpt-4")
        tokens = encoding.encode(text)
        
        chunks = []
        start = 0
        
        while start < len(tokens):
            end = start + chunk_size
            chunk_tokens = tokens[start:end]
            chunk_text = encoding.decode(chunk_tokens)
            chunks.append(chunk_text)
            start = end - overlap
        
        return chunks
    
    def chunk_by_sentences(self, text: str, sentences_per_chunk: int = 5) -> list:
        """Chunk by sentences"""
        import re
        
        # Split into sentences
        sentences = re.split(r'[.!?]+', text)
        sentences = [s.strip() for s in sentences if s.strip()]
        
        chunks = []
        for i in range(0, len(sentences), sentences_per_chunk):
            chunk = ". ".join(sentences[i:i+sentences_per_chunk]) + "."
            chunks.append(chunk)
        
        return chunks
    
    def chunk_by_paragraphs(self, text: str) -> list:
        """Chunk by paragraphs"""
        paragraphs = text.split('\n\n')
        return [p.strip() for p in paragraphs if p.strip()]
    
    def semantic_chunking(self, text: str, similarity_threshold: float = 0.7) -> list:
        """Chunk based on semantic similarity"""
        sentences = self.split_sentences(text)
        
        if not sentences:
            return []
        
        chunks = []
        current_chunk = [sentences[0]]
        
        for i in range(1, len(sentences)):
            # Check similarity with current chunk
            chunk_text = " ".join(current_chunk)
            similarity = self.calculate_similarity(chunk_text, sentences[i])
            
            if similarity >= similarity_threshold:
                current_chunk.append(sentences[i])
            else:
                # Start new chunk
                chunks.append(" ".join(current_chunk))
                current_chunk = [sentences[i]]
        
        # Add last chunk
        if current_chunk:
            chunks.append(" ".join(current_chunk))
        
        return chunks

Database Queries

SQL Databases

import sqlite3
from typing import List, Dict

class SQLAgent:
    """Agent that can query SQL databases"""
    
    def __init__(self, db_path: str):
        self.db_path = db_path
        self.client = openai.OpenAI()
    
    def get_schema(self) -> str:
        """Get database schema"""
        conn = sqlite3.connect(self.db_path)
        cursor = conn.cursor()
        
        # Get all tables
        cursor.execute("SELECT name FROM sqlite_master WHERE type='table'")
        tables = cursor.fetchall()
        
        schema = []
        for table in tables:
            table_name = table[0]
            cursor.execute(f"PRAGMA table_info({table_name})")
            columns = cursor.fetchall()
            
            schema.append(f"Table: {table_name}")
            for col in columns:
                schema.append(f"  - {col[1]} ({col[2]})")
        
        conn.close()
        return "\n".join(schema)
    
    def natural_language_query(self, question: str) -> Dict:
        """Convert natural language to SQL and execute"""
        # Generate SQL
        sql = self.generate_sql(question)
        
        # Execute SQL
        results = self.execute_sql(sql)
        
        # Format response
        answer = self.format_results(question, results)
        
        return {
            "question": question,
            "sql": sql,
            "results": results,
            "answer": answer
        }
    
    def generate_sql(self, question: str) -> str:
        """Generate SQL from natural language"""
        schema = self.get_schema()
        
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[
                {
                    "role": "system",
                    "content": f"""You are a SQL expert. Convert natural language questions to SQL queries.

Database schema:
{schema}

Rules:
- Return only the SQL query, no explanations
- Use proper SQL syntax
- Be careful with column names
- Use appropriate JOINs when needed"""
                },
                {
                    "role": "user",
                    "content": question
                }
            ],
            temperature=0.1
        )
        
        sql = response.choices[0].message.content.strip()
        # Remove markdown code blocks if present
        sql = sql.replace("```sql", "").replace("```", "").strip()
        
        return sql
    
    def execute_sql(self, sql: str) -> List[Dict]:
        """Execute SQL query safely"""
        # Validate query (read-only)
        if not self.is_safe_query(sql):
            raise ValueError("Only SELECT queries are allowed")
        
        conn = sqlite3.connect(self.db_path)
        conn.row_factory = sqlite3.Row
        cursor = conn.cursor()
        
        try:
            cursor.execute(sql)
            rows = cursor.fetchall()
            
            # Convert to list of dicts
            results = [dict(row) for row in rows]
            
            conn.close()
            return results
            
        except Exception as e:
            conn.close()
            raise Exception(f"SQL execution error: {str(e)}")
    
    def is_safe_query(self, sql: str) -> bool:
        """Check if query is safe (read-only)"""
        sql_upper = sql.upper().strip()
        
        # Only allow SELECT
        if not sql_upper.startswith("SELECT"):
            return False
        
        # Disallow dangerous keywords
        dangerous = ["DROP", "DELETE", "INSERT", "UPDATE", "ALTER", "CREATE"]
        for keyword in dangerous:
            if keyword in sql_upper:
                return False
        
        return True
    
    def format_results(self, question: str, results: List[Dict]) -> str:
        """Format results as natural language"""
        if not results:
            return "No results found."
        
        # Convert results to text
        results_text = "\n".join([str(row) for row in results[:10]])
        
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[
                {
                    "role": "user",
                    "content": f"""Answer this question based on the query results:

Question: {question}

Results:
{results_text}

Provide a clear, natural language answer:"""
                }
            ]
        )
        
        return response.choices[0].message.content

# Usage
agent = SQLAgent("company.db")
result = agent.natural_language_query("How many employees are in the sales department?")
print(result['answer'])

NoSQL Databases

from pymongo import MongoClient

class MongoDBAgent:
    """Agent for MongoDB queries"""
    
    def __init__(self, connection_string: str, database: str):
        self.client = MongoClient(connection_string)
        self.db = self.client[database]
        self.llm = openai.OpenAI()
    
    def query(self, question: str, collection: str) -> dict:
        """Query MongoDB using natural language"""
        # Generate MongoDB query
        query_dict = self.generate_query(question, collection)
        
        # Execute query
        results = list(self.db[collection].find(query_dict).limit(10))
        
        # Format response
        answer = self.format_results(question, results)
        
        return {
            "question": question,
            "query": query_dict,
            "results": results,
            "answer": answer
        }
    
    def generate_query(self, question: str, collection: str) -> dict:
        """Generate MongoDB query from natural language"""
        # Get sample document
        sample = self.db[collection].find_one()
        
        response = self.llm.chat.completions.create(
            model="gpt-4",
            messages=[
                {
                    "role": "system",
                    "content": f"""Convert natural language to MongoDB query.

Collection: {collection}
Sample document: {sample}

Return only valid JSON for MongoDB find() query."""
                },
                {
                    "role": "user",
                    "content": question
                }
            ],
            temperature=0.1
        )
        
        import json
        query_str = response.choices[0].message.content.strip()
        return json.loads(query_str)

API Integrations

REST API Client

import requests
from typing import Optional

class APIAgent:
    """Agent that can call REST APIs"""
    
    def __init__(self):
        self.client = openai.OpenAI()
        self.session = requests.Session()
    
    def call_api(self, 
                 url: str,
                 method: str = "GET",
                 headers: Optional[dict] = None,
                 params: Optional[dict] = None,
                 data: Optional[dict] = None) -> dict:
        """Make API call"""
        try:
            response = self.session.request(
                method=method,
                url=url,
                headers=headers,
                params=params,
                json=data,
                timeout=30
            )
            
            response.raise_for_status()
            
            return {
                "success": True,
                "status_code": response.status_code,
                "data": response.json() if response.content else None
            }
            
        except requests.exceptions.RequestException as e:
            return {
                "success": False,
                "error": str(e)
            }
    
    def natural_language_api_call(self, request: str, api_spec: dict) -> dict:
        """Convert natural language to API call"""
        # Generate API call parameters
        params = self.generate_api_params(request, api_spec)
        
        # Make API call
        result = self.call_api(**params)
        
        # Format response
        if result['success']:
            answer = self.format_api_response(request, result['data'])
            return {
                "request": request,
                "api_call": params,
                "response": result['data'],
                "answer": answer
            }
        else:
            return {
                "request": request,
                "error": result['error']
            }
    
    def generate_api_params(self, request: str, api_spec: dict) -> dict:
        """Generate API parameters from natural language"""
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[
                {
                    "role": "system",
                    "content": f"""Convert natural language to API call parameters.

API Specification:
{json.dumps(api_spec, indent=2)}

Return JSON with: url, method, headers, params, data"""
                },
                {
                    "role": "user",
                    "content": request
                }
            ],
            temperature=0.1
        )
        
        import json
        return json.loads(response.choices[0].message.content)

GraphQL Client

from gql import gql, Client
from gql.transport.requests import RequestsHTTPTransport

class GraphQLAgent:
    """Agent for GraphQL APIs"""
    
    def __init__(self, endpoint: str):
        transport = RequestsHTTPTransport(url=endpoint)
        self.client = Client(transport=transport, fetch_schema_from_transport=True)
        self.llm = openai.OpenAI()
    
    def query(self, natural_language_query: str) -> dict:
        """Execute GraphQL query from natural language"""
        # Generate GraphQL query
        graphql_query = self.generate_graphql(natural_language_query)
        
        # Execute query
        query = gql(graphql_query)
        result = self.client.execute(query)
        
        # Format response
        answer = self.format_results(natural_language_query, result)
        
        return {
            "question": natural_language_query,
            "graphql": graphql_query,
            "result": result,
            "answer": answer
        }
    
    def generate_graphql(self, question: str) -> str:
        """Generate GraphQL query"""
        schema = self.client.schema
        
        response = self.llm.chat.completions.create(
            model="gpt-4",
            messages=[
                {
                    "role": "system",
                    "content": f"""Generate GraphQL query from natural language.

Schema: {schema}

Return only the GraphQL query."""
                },
                {
                    "role": "user",
                    "content": question
                }
            ]
        )
        
        return response.choices[0].message.content.strip()

File System Operations

Safe File Access

import os
from pathlib import Path

class FileSystemAgent:
    """Agent with safe file system access"""
    
    def __init__(self, allowed_directory: str):
        self.allowed_directory = Path(allowed_directory).resolve()
    
    def is_safe_path(self, path: str) -> bool:
        """Check if path is within allowed directory"""
        try:
            requested_path = (self.allowed_directory / path).resolve()
            return requested_path.is_relative_to(self.allowed_directory)
        except:
            return False
    
    def read_file(self, path: str) -> dict:
        """Read file safely"""
        if not self.is_safe_path(path):
            return {"success": False, "error": "Access denied"}
        
        try:
            full_path = self.allowed_directory / path
            with open(full_path, 'r') as f:
                content = f.read()
            
            return {
                "success": True,
                "content": content,
                "size": len(content)
            }
        except Exception as e:
            return {
                "success": False,
                "error": str(e)
            }
    
    def list_files(self, path: str = ".") -> dict:
        """List files in directory"""
        if not self.is_safe_path(path):
            return {"success": False, "error": "Access denied"}
        
        try:
            full_path = self.allowed_directory / path
            files = []
            
            for item in full_path.iterdir():
                files.append({
                    "name": item.name,
                    "type": "directory" if item.is_dir() else "file",
                    "size": item.stat().st_size if item.is_file() else None
                })
            
            return {
                "success": True,
                "files": files
            }
        except Exception as e:
            return {
                "success": False,
                "error": str(e)
            }
    
    def search_files(self, pattern: str, path: str = ".") -> dict:
        """Search for files matching pattern"""
        if not self.is_safe_path(path):
            return {"success": False, "error": "Access denied"}
        
        try:
            full_path = self.allowed_directory / path
            matches = list(full_path.rglob(pattern))
            
            results = [
                {
                    "path": str(m.relative_to(self.allowed_directory)),
                    "name": m.name,
                    "size": m.stat().st_size if m.is_file() else None
                }
                for m in matches
            ]
            
            return {
                "success": True,
                "matches": results
            }
        except Exception as e:
            return {
                "success": False,
                "error": str(e)
            }

Complete Data Access Agent

class DataAccessAgent:
    """Unified agent for data access"""
    
    def __init__(self):
        self.rag = SimpleRAG()
        self.sql_agent = None
        self.api_agent = APIAgent()
        self.fs_agent = None
        self.client = openai.OpenAI()
    
    def configure_sql(self, db_path: str):
        """Configure SQL access"""
        self.sql_agent = SQLAgent(db_path)
    
    def configure_filesystem(self, allowed_dir: str):
        """Configure file system access"""
        self.fs_agent = FileSystemAgent(allowed_dir)
    
    def query(self, question: str) -> str:
        """Answer question using appropriate data source"""
        # Determine which data source to use
        source = self.determine_source(question)
        
        if source == "rag":
            return self.rag.query(question)
        elif source == "sql" and self.sql_agent:
            result = self.sql_agent.natural_language_query(question)
            return result['answer']
        elif source == "api":
            # Would need API spec
            return "API access requires configuration"
        elif source == "filesystem" and self.fs_agent:
            # Would need to determine file operation
            return "File system access requires specific operation"
        else:
            return "Unable to determine appropriate data source"
    
    def determine_source(self, question: str) -> str:
        """Determine which data source to use"""
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[
                {
                    "role": "user",
                    "content": f"""Which data source should be used for this question?

Question: {question}

Options: rag, sql, api, filesystem

Answer with just the option:"""
                }
            ],
            temperature=0.1
        )
        
        return response.choices[0].message.content.strip().lower()

Best Practices

Validate queries: Check SQL/API calls before execution
Limit results: Don’t return huge datasets
Cache responses: Avoid redundant queries
Handle errors: Graceful failure handling
Secure credentials: Never expose API keys
Rate limiting: Respect API limits
Chunk large documents: Better retrieval
Use appropriate embeddings: Match your use case
Monitor costs: Track API usage
Test thoroughly: Verify data access works

Next Steps

You now understand data access and retrieval! Next, we’ll explore web interaction including browser automation and scraping.

Web Interaction

Browser Automation

Agents can interact with websites like humans do—clicking, typing, scrolling, and extracting information.

Why Browser Automation?

Access dynamic content (JavaScript-rendered)
Interact with web applications
Fill forms and submit data
Navigate multi-page workflows
Handle authentication

Playwright Basics

from playwright.sync_api import sync_playwright
from typing import Optional

class BrowserAgent:
    """Agent with browser automation capabilities"""
    
    def __init__(self, headless: bool = True):
        self.headless = headless
        self.playwright = None
        self.browser = None
        self.page = None
    
    def start(self):
        """Start browser"""
        self.playwright = sync_playwright().start()
        self.browser = self.playwright.chromium.launch(headless=self.headless)
        self.page = self.browser.new_page()
    
    def stop(self):
        """Stop browser"""
        if self.browser:
            self.browser.close()
        if self.playwright:
            self.playwright.stop()
    
    def navigate(self, url: str) -> dict:
        """Navigate to URL"""
        try:
            self.page.goto(url, wait_until="networkidle")
            return {
                "success": True,
                "url": self.page.url,
                "title": self.page.title()
            }
        except Exception as e:
            return {
                "success": False,
                "error": str(e)
            }
    
    def click(self, selector: str) -> dict:
        """Click element"""
        try:
            self.page.click(selector)
            return {"success": True}
        except Exception as e:
            return {"success": False, "error": str(e)}
    
    def type_text(self, selector: str, text: str) -> dict:
        """Type text into element"""
        try:
            self.page.fill(selector, text)
            return {"success": True}
        except Exception as e:
            return {"success": False, "error": str(e)}
    
    def get_text(self, selector: str) -> Optional[str]:
        """Get text from element"""
        try:
            return self.page.text_content(selector)
        except:
            return None
    
    def screenshot(self, path: str = "screenshot.png") -> dict:
        """Take screenshot"""
        try:
            self.page.screenshot(path=path)
            return {"success": True, "path": path}
        except Exception as e:
            return {"success": False, "error": str(e)}
    
    def get_page_content(self) -> str:
        """Get full page HTML"""
        return self.page.content()

# Usage
agent = BrowserAgent()
agent.start()

# Navigate
agent.navigate("https://example.com")

# Interact
agent.type_text("#search", "AI agents")
agent.click("button[type='submit']")

# Extract
results = agent.get_text(".results")

agent.stop()

Selenium Alternative

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

class SeleniumAgent:
    """Browser automation with Selenium"""
    
    def __init__(self):
        options = webdriver.ChromeOptions()
        options.add_argument('--headless')
        options.add_argument('--no-sandbox')
        self.driver = webdriver.Chrome(options=options)
        self.wait = WebDriverWait(self.driver, 10)
    
    def navigate(self, url: str):
        """Navigate to URL"""
        self.driver.get(url)
    
    def click(self, selector: str, by: By = By.CSS_SELECTOR):
        """Click element"""
        element = self.wait.until(
            EC.element_to_be_clickable((by, selector))
        )
        element.click()
    
    def type_text(self, selector: str, text: str, by: By = By.CSS_SELECTOR):
        """Type text"""
        element = self.wait.until(
            EC.presence_of_element_located((by, selector))
        )
        element.clear()
        element.send_keys(text)
    
    def get_text(self, selector: str, by: By = By.CSS_SELECTOR) -> str:
        """Get element text"""
        element = self.wait.until(
            EC.presence_of_element_located((by, selector))
        )
        return element.text
    
    def close(self):
        """Close browser"""
        self.driver.quit()

Web Scraping

Extract structured data from websites.

BeautifulSoup Scraping

import requests
from bs4 import BeautifulSoup
from typing import List, Dict

class WebScraper:
    """Web scraping agent"""
    
    def __init__(self):
        self.session = requests.Session()
        self.session.headers.update({
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
        })
    
    def fetch_page(self, url: str) -> Optional[BeautifulSoup]:
        """Fetch and parse page"""
        try:
            response = self.session.get(url, timeout=10)
            response.raise_for_status()
            return BeautifulSoup(response.content, 'html.parser')
        except Exception as e:
            print(f"Error fetching {url}: {e}")
            return None
    
    def extract_links(self, url: str) -> List[str]:
        """Extract all links from page"""
        soup = self.fetch_page(url)
        if not soup:
            return []
        
        links = []
        for a in soup.find_all('a', href=True):
            href = a['href']
            # Convert relative to absolute
            if href.startswith('/'):
                from urllib.parse import urljoin
                href = urljoin(url, href)
            links.append(href)
        
        return links
    
    def extract_text(self, url: str, selector: Optional[str] = None) -> str:
        """Extract text from page"""
        soup = self.fetch_page(url)
        if not soup:
            return ""
        
        if selector:
            element = soup.select_one(selector)
            return element.get_text(strip=True) if element else ""
        else:
            return soup.get_text(separator='\n', strip=True)
    
    def extract_structured_data(self, url: str, schema: dict) -> List[Dict]:
        """Extract structured data based on schema"""
        soup = self.fetch_page(url)
        if not soup:
            return []
        
        results = []
        
        # Find all items matching container selector
        items = soup.select(schema['container'])
        
        for item in items:
            data = {}
            for field, selector in schema['fields'].items():
                element = item.select_one(selector)
                if element:
                    data[field] = element.get_text(strip=True)
            
            if data:
                results.append(data)
        
        return results

# Usage
scraper = WebScraper()

# Extract structured data
schema = {
    'container': '.product',
    'fields': {
        'name': '.product-name',
        'price': '.product-price',
        'rating': '.product-rating'
    }
}

products = scraper.extract_structured_data('https://example.com/products', schema)

Handling Dynamic Content

class DynamicScraper:
    """Scrape JavaScript-rendered content"""
    
    def __init__(self):
        self.browser = BrowserAgent()
        self.browser.start()
    
    def scrape_dynamic(self, url: str, wait_selector: str = None) -> str:
        """Scrape page with JavaScript"""
        self.browser.navigate(url)
        
        # Wait for content to load
        if wait_selector:
            self.browser.page.wait_for_selector(wait_selector)
        else:
            self.browser.page.wait_for_load_state("networkidle")
        
        # Get rendered HTML
        return self.browser.get_page_content()
    
    def scrape_infinite_scroll(self, url: str, max_scrolls: int = 10) -> str:
        """Scrape infinite scroll pages"""
        self.browser.navigate(url)
        
        for _ in range(max_scrolls):
            # Scroll to bottom
            self.browser.page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
            
            # Wait for new content
            self.browser.page.wait_for_timeout(1000)
        
        return self.browser.get_page_content()
    
    def close(self):
        """Close browser"""
        self.browser.stop()

Automated Form Submission

class FormAgent:
    """Agent that can fill and submit forms"""
    
    def __init__(self):
        self.browser = BrowserAgent()
        self.browser.start()
    
    def fill_form(self, url: str, form_data: dict) -> dict:
        """Fill and submit form"""
        try:
            # Navigate to page
            self.browser.navigate(url)
            
            # Fill fields
            for selector, value in form_data.items():
                if isinstance(value, str):
                    self.browser.type_text(selector, value)
                elif value.get('type') == 'click':
                    self.browser.click(selector)
                elif value.get('type') == 'select':
                    self.browser.page.select_option(selector, value['value'])
            
            # Submit form
            submit_button = form_data.get('submit_button', 'button[type="submit"]')
            self.browser.click(submit_button)
            
            # Wait for response
            self.browser.page.wait_for_load_state("networkidle")
            
            return {
                "success": True,
                "url": self.browser.page.url,
                "title": self.browser.page.title()
            }
            
        except Exception as e:
            return {
                "success": False,
                "error": str(e)
            }
    
    def close(self):
        """Close browser"""
        self.browser.stop()

# Usage
agent = FormAgent()

form_data = {
    '#name': 'John Doe',
    '#email': 'john@example.com',
    '#message': 'Hello from agent!',
    'submit_button': '#submit-btn'
}

result = agent.fill_form('https://example.com/contact', form_data)
agent.close()

class NavigationAgent:
    """Agent for multi-step web workflows"""
    
    def __init__(self):
        self.browser = BrowserAgent()
        self.browser.start()
        self.history = []
    
    def execute_workflow(self, steps: List[dict]) -> dict:
        """Execute multi-step workflow"""
        results = []
        
        for i, step in enumerate(steps):
            print(f"Step {i+1}: {step['action']}")
            
            try:
                if step['action'] == 'navigate':
                    result = self.browser.navigate(step['url'])
                
                elif step['action'] == 'click':
                    result = self.browser.click(step['selector'])
                
                elif step['action'] == 'type':
                    result = self.browser.type_text(step['selector'], step['text'])
                
                elif step['action'] == 'wait':
                    self.browser.page.wait_for_timeout(step['duration'])
                    result = {"success": True}
                
                elif step['action'] == 'extract':
                    text = self.browser.get_text(step['selector'])
                    result = {"success": True, "data": text}
                
                elif step['action'] == 'screenshot':
                    result = self.browser.screenshot(step.get('path', f'step_{i}.png'))
                
                else:
                    result = {"success": False, "error": "Unknown action"}
                
                results.append({
                    "step": i + 1,
                    "action": step['action'],
                    "result": result
                })
                
                self.history.append({
                    "url": self.browser.page.url,
                    "title": self.browser.page.title()
                })
                
                if not result.get('success', False):
                    break
                    
            except Exception as e:
                results.append({
                    "step": i + 1,
                    "action": step['action'],
                    "result": {"success": False, "error": str(e)}
                })
                break
        
        return {
            "completed": len(results),
            "total": len(steps),
            "results": results,
            "history": self.history
        }
    
    def close(self):
        """Close browser"""
        self.browser.stop()

# Usage
agent = NavigationAgent()

workflow = [
    {"action": "navigate", "url": "https://example.com"},
    {"action": "click", "selector": "#login-btn"},
    {"action": "type", "selector": "#username", "text": "user@example.com"},
    {"action": "type", "selector": "#password", "text": "password123"},
    {"action": "click", "selector": "#submit"},
    {"action": "wait", "duration": 2000},
    {"action": "extract", "selector": ".welcome-message"},
    {"action": "screenshot", "path": "logged-in.png"}
]

result = agent.execute_workflow(workflow)
agent.close()

Screenshot and Visual Understanding

Taking Screenshots

class ScreenshotAgent:
    """Agent for visual capture and analysis"""
    
    def __init__(self):
        self.browser = BrowserAgent()
        self.browser.start()
        self.client = openai.OpenAI()
    
    def capture_and_analyze(self, url: str, question: str) -> dict:
        """Capture screenshot and analyze with vision model"""
        # Navigate and capture
        self.browser.navigate(url)
        screenshot_path = "temp_screenshot.png"
        self.browser.screenshot(screenshot_path)
        
        # Analyze with vision model
        import base64
        with open(screenshot_path, "rb") as f:
            image_data = base64.b64encode(f.read()).decode()
        
        response = self.client.chat.completions.create(
            model="gpt-4-vision-preview",
            messages=[
                {
                    "role": "user",
                    "content": [
                        {"type": "text", "text": question},
                        {
                            "type": "image_url",
                            "image_url": {
                                "url": f"data:image/png;base64,{image_data}"
                            }
                        }
                    ]
                }
            ],
            max_tokens=500
        )
        
        return {
            "url": url,
            "question": question,
            "analysis": response.choices[0].message.content,
            "screenshot": screenshot_path
        }
    
    def compare_pages(self, url1: str, url2: str) -> dict:
        """Compare two pages visually"""
        # Capture both
        self.browser.navigate(url1)
        self.browser.screenshot("page1.png")
        
        self.browser.navigate(url2)
        self.browser.screenshot("page2.png")
        
        # Compare with vision model
        question = "What are the main differences between these two pages?"
        
        # Would need to send both images to vision model
        # Implementation depends on specific vision API
        
        return {
            "url1": url1,
            "url2": url2,
            "screenshot1": "page1.png",
            "screenshot2": "page2.png"
        }
    
    def close(self):
        """Close browser"""
        self.browser.stop()

Element Detection

class ElementDetector:
    """Detect and locate elements on page"""
    
    def __init__(self):
        self.browser = BrowserAgent()
        self.browser.start()
    
    def find_element_by_description(self, url: str, description: str) -> Optional[str]:
        """Find element selector by natural language description"""
        self.browser.navigate(url)
        
        # Get page structure
        elements = self.browser.page.evaluate("""
            () => {
                const elements = [];
                document.querySelectorAll('button, a, input, select, textarea').forEach(el => {
                    elements.push({
                        tag: el.tagName,
                        text: el.textContent.trim(),
                        id: el.id,
                        class: el.className,
                        type: el.type
                    });
                });
                return elements;
            }
        """)
        
        # Use LLM to match description to element
        prompt = f"""Find the element matching this description: {description}

Available elements:
{json.dumps(elements, indent=2)}

Return the best CSS selector to target this element:"""
        
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}]
        )
        
        return response.choices[0].message.content.strip()
    
    def close(self):
        """Close browser"""
        self.browser.stop()

Complete Web Interaction Agent

class WebAgent:
    """Complete web interaction agent"""
    
    def __init__(self):
        self.browser = BrowserAgent()
        self.browser.start()
        self.scraper = WebScraper()
        self.client = openai.OpenAI()
    
    def execute_task(self, task: str, url: str) -> str:
        """Execute web task from natural language"""
        # Generate action plan
        plan = self.generate_plan(task, url)
        
        # Execute plan
        results = []
        for step in plan:
            result = self.execute_step(step)
            results.append(result)
        
        # Summarize results
        return self.summarize_results(task, results)
    
    def generate_plan(self, task: str, url: str) -> List[dict]:
        """Generate action plan for task"""
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[
                {
                    "role": "system",
                    "content": """Generate a step-by-step plan for web automation.

Available actions:
- navigate: Go to URL
- click: Click element (provide selector)
- type: Type text (provide selector and text)
- extract: Extract text (provide selector)
- wait: Wait for duration (milliseconds)
- screenshot: Take screenshot

Return JSON array of steps."""
                },
                {
                    "role": "user",
                    "content": f"Task: {task}\nStarting URL: {url}"
                }
            ],
            temperature=0.2
        )
        
        import json
        return json.loads(response.choices[0].message.content)
    
    def execute_step(self, step: dict) -> dict:
        """Execute single step"""
        action = step['action']
        
        try:
            if action == 'navigate':
                return self.browser.navigate(step['url'])
            elif action == 'click':
                return self.browser.click(step['selector'])
            elif action == 'type':
                return self.browser.type_text(step['selector'], step['text'])
            elif action == 'extract':
                text = self.browser.get_text(step['selector'])
                return {"success": True, "data": text}
            elif action == 'wait':
                self.browser.page.wait_for_timeout(step['duration'])
                return {"success": True}
            elif action == 'screenshot':
                return self.browser.screenshot(step.get('path', 'screenshot.png'))
            else:
                return {"success": False, "error": f"Unknown action: {action}"}
        except Exception as e:
            return {"success": False, "error": str(e)}
    
    def summarize_results(self, task: str, results: List[dict]) -> str:
        """Summarize execution results"""
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[
                {
                    "role": "user",
                    "content": f"""Summarize the results of this web automation task:

Task: {task}

Results:
{json.dumps(results, indent=2)}

Provide a clear summary of what was accomplished:"""
                }
            ]
        )
        
        return response.choices[0].message.content
    
    def close(self):
        """Close browser"""
        self.browser.stop()

# Usage
agent = WebAgent()
result = agent.execute_task(
    "Search for 'AI agents' on the website and extract the top 3 results",
    "https://example.com"
)
print(result)
agent.close()

Best Practices

Respect robots.txt: Check if scraping is allowed
Rate limiting: Don’t overwhelm servers
Use headless mode: Faster and less resource-intensive
Handle timeouts: Set reasonable wait times
Error recovery: Retry failed operations
Clean up resources: Close browsers properly
User agent: Identify your bot appropriately
Cache responses: Avoid redundant requests
Validate selectors: Check elements exist before interacting
Monitor performance: Track execution time

Common Pitfalls

Pitfall 1: Stale Selectors

Problem: Element selectors change Solution: Use more robust selectors (data attributes, ARIA labels)

Pitfall 2: Race Conditions

Problem: Clicking before element is ready Solution: Use explicit waits

Pitfall 3: Memory Leaks

Problem: Not closing browsers Solution: Always close in finally block or use context managers

Pitfall 4: Detection

Problem: Website blocks automated access Solution: Use stealth plugins, rotate user agents, add delays

Next Steps

Chapter 4 (Agent Tools & Capabilities) is complete! You now understand code execution, data access, and web interaction. In Chapter 5, we’ll explore production-ready agents including reliability, testing, and monitoring.

Reliability & Safety

Module 5: Learning Objectives

By the end of this module, you will:

✓ Implement input validation and guardrails
✓ Design comprehensive testing strategies
✓ Set up monitoring and observability systems
✓ Handle failures gracefully with retries and fallbacks
✓ Measure and improve agent reliability

Input Validation and Sanitization

Never trust user input. Always validate and sanitize.

Input Validation

from typing import Optional
import re

class InputValidator:
    """Validate user inputs"""
    
    def __init__(self):
        self.max_input_length = 10000
        self.max_file_size = 10 * 1024 * 1024  # 10MB
    
    def validate_text_input(self, text: str) -> dict:
        """Validate text input"""
        errors = []
        
        # Check type
        if not isinstance(text, str):
            return {"valid": False, "errors": ["Input must be string"]}
        
        # Check length
        if len(text) > self.max_input_length:
            errors.append(f"Input too long (max {self.max_input_length} chars)")
        
        # Check for null bytes
        if '\x00' in text:
            errors.append("Invalid characters detected")
        
        # Check for control characters
        if any(ord(c) < 32 and c not in '\n\r\t' for c in text):
            errors.append("Control characters not allowed")
        
        return {
            "valid": len(errors) == 0,
            "errors": errors
        }
    
    def validate_url(self, url: str) -> dict:
        """Validate URL"""
        if not isinstance(url, str):
            return {"valid": False, "errors": ["URL must be string"]}
        
        # Basic URL pattern
        url_pattern = re.compile(
            r'^https?://'  # http:// or https://
            r'(?:(?:[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?\.)+[A-Z]{2,6}\.?|'  # domain
            r'localhost|'  # localhost
            r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})'  # IP
            r'(?::\d+)?'  # optional port
            r'(?:/?|[/?]\S+)$', re.IGNORECASE)
        
        if not url_pattern.match(url):
            return {"valid": False, "errors": ["Invalid URL format"]}
        
        # Check for dangerous protocols
        if url.startswith(('file://', 'javascript:', 'data:')):
            return {"valid": False, "errors": ["Unsafe URL protocol"]}
        
        return {"valid": True, "errors": []}
    
    def validate_file_path(self, path: str, allowed_extensions: list = None) -> dict:
        """Validate file path"""
        errors = []
        
        # Check for path traversal
        if '..' in path or path.startswith('/'):
            errors.append("Path traversal detected")
        
        # Check extension
        if allowed_extensions:
            ext = path.split('.')[-1].lower()
            if ext not in allowed_extensions:
                errors.append(f"File type not allowed. Allowed: {allowed_extensions}")
        
        return {
            "valid": len(errors) == 0,
            "errors": errors
        }
    
    def sanitize_text(self, text: str) -> str:
        """Sanitize text input"""
        # Remove null bytes
        text = text.replace('\x00', '')
        
        # Remove control characters except newlines and tabs
        text = ''.join(c for c in text if ord(c) >= 32 or c in '\n\r\t')
        
        # Trim whitespace
        text = text.strip()
        
        # Limit length
        if len(text) > self.max_input_length:
            text = text[:self.max_input_length]
        
        return text

SQL Injection Prevention

import sqlite3

class SafeDatabase:
    """Database access with SQL injection prevention"""
    
    def __init__(self, db_path: str):
        self.db_path = db_path
    
    def query(self, sql: str, params: tuple = ()) -> list:
        """Execute query with parameterized statements"""
        conn = sqlite3.connect(self.db_path)
        cursor = conn.cursor()
        
        try:
            # Always use parameterized queries
            cursor.execute(sql, params)
            results = cursor.fetchall()
            conn.close()
            return results
        except Exception as e:
            conn.close()
            raise Exception(f"Query error: {str(e)}")
    
    def safe_search(self, table: str, column: str, value: str) -> list:
        """Safe search with validation"""
        # Validate table and column names (whitelist)
        allowed_tables = ['users', 'products', 'orders']
        allowed_columns = ['name', 'email', 'description', 'title']
        
        if table not in allowed_tables:
            raise ValueError(f"Invalid table: {table}")
        
        if column not in allowed_columns:
            raise ValueError(f"Invalid column: {column}")
        
        # Use parameterized query
        sql = f"SELECT * FROM {table} WHERE {column} LIKE ?"
        return self.query(sql, (f"%{value}%",))

Output Guardrails

Ensure agent outputs are safe and appropriate.

Content Filtering

class OutputGuardrails:
    """Filter and validate agent outputs"""
    
    def __init__(self):
        self.client = openai.OpenAI()
        self.blocked_patterns = [
            r'\b\d{3}-\d{2}-\d{4}\b',  # SSN
            r'\b\d{16}\b',  # Credit card
            r'[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}',  # Email (if needed)
        ]
    
    def check_output(self, text: str) -> dict:
        """Check if output is safe"""
        issues = []
        
        # Check for PII
        for pattern in self.blocked_patterns:
            if re.search(pattern, text):
                issues.append(f"Potential PII detected: {pattern}")
        
        # Check for harmful content
        if self.contains_harmful_content(text):
            issues.append("Potentially harmful content detected")
        
        # Check length
        if len(text) > 50000:
            issues.append("Output too long")
        
        return {
            "safe": len(issues) == 0,
            "issues": issues
        }
    
    def contains_harmful_content(self, text: str) -> bool:
        """Check for harmful content using moderation API"""
        try:
            response = self.client.moderations.create(input=text)
            result = response.results[0]
            
            # Check if any category is flagged
            return any([
                result.categories.hate,
                result.categories.violence,
                result.categories.self_harm,
                result.categories.sexual,
            ])
        except:
            return False
    
    def redact_pii(self, text: str) -> str:
        """Redact PII from text"""
        # Redact SSN
        text = re.sub(r'\b\d{3}-\d{2}-\d{4}\b', '[REDACTED-SSN]', text)
        
        # Redact credit cards
        text = re.sub(r'\b\d{16}\b', '[REDACTED-CC]', text)
        
        # Redact emails (if needed)
        text = re.sub(
            r'[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}',
            '[REDACTED-EMAIL]',
            text
        )
        
        return text
    
    def filter_output(self, text: str) -> dict:
        """Filter and clean output"""
        check = self.check_output(text)
        
        if not check['safe']:
            # Redact PII
            text = self.redact_pii(text)
            
            # Re-check
            check = self.check_output(text)
        
        return {
            "text": text,
            "safe": check['safe'],
            "issues": check['issues']
        }

Response Validation

class ResponseValidator:
    """Validate agent responses"""
    
    def validate_response(self, response: str, expected_format: str = None) -> dict:
        """Validate response format and content"""
        errors = []
        
        # Check not empty
        if not response or not response.strip():
            errors.append("Empty response")
        
        # Check format if specified
        if expected_format == 'json':
            try:
                json.loads(response)
            except json.JSONDecodeError:
                errors.append("Invalid JSON format")
        
        elif expected_format == 'markdown':
            # Basic markdown validation
            if not any(marker in response for marker in ['#', '*', '-', '`']):
                errors.append("Not valid markdown")
        
        # Check for refusal patterns
        refusal_patterns = [
            "I cannot", "I'm unable to", "I can't",
            "I don't have access", "I'm not able to"
        ]
        
        if any(pattern.lower() in response.lower() for pattern in refusal_patterns):
            errors.append("Agent refused to complete task")
        
        return {
            "valid": len(errors) == 0,
            "errors": errors
        }

Rate Limiting and Cost Control

Prevent runaway costs and abuse.

Rate Limiter

import time
from collections import defaultdict
from threading import Lock

class RateLimiter:
    """Rate limit API calls"""
    
    def __init__(self):
        self.requests = defaultdict(list)
        self.lock = Lock()
    
    def check_rate_limit(self, 
                         user_id: str,
                         max_requests: int = 100,
                         window_seconds: int = 3600) -> dict:
        """Check if user is within rate limit"""
        with self.lock:
            current_time = time.time()
            
            # Remove old requests outside window
            self.requests[user_id] = [
                req_time for req_time in self.requests[user_id]
                if current_time - req_time < window_seconds
            ]
            
            # Check limit
            if len(self.requests[user_id]) >= max_requests:
                return {
                    "allowed": False,
                    "remaining": 0,
                    "reset_in": window_seconds - (current_time - self.requests[user_id][0])
                }
            
            # Add current request
            self.requests[user_id].append(current_time)
            
            return {
                "allowed": True,
                "remaining": max_requests - len(self.requests[user_id]),
                "reset_in": window_seconds
            }

Cost Tracker

class CostTracker:
    """Track and limit API costs"""
    
    def __init__(self, max_cost_per_user: float = 10.0):
        self.costs = defaultdict(float)
        self.max_cost_per_user = max_cost_per_user
        self.lock = Lock()
    
    def estimate_cost(self, model: str, input_tokens: int, output_tokens: int) -> float:
        """Estimate cost for API call"""
        # Pricing per 1K tokens (example rates)
        pricing = {
            'gpt-4': {'input': 0.03, 'output': 0.06},
            'gpt-4-turbo': {'input': 0.01, 'output': 0.03},
            'gpt-3.5-turbo': {'input': 0.0005, 'output': 0.0015},
        }
        
        if model not in pricing:
            model = 'gpt-4'  # Default to most expensive
        
        cost = (
            (input_tokens / 1000) * pricing[model]['input'] +
            (output_tokens / 1000) * pricing[model]['output']
        )
        
        return cost
    
    def check_budget(self, user_id: str, estimated_cost: float) -> dict:
        """Check if user has budget for request"""
        with self.lock:
            current_cost = self.costs[user_id]
            
            if current_cost + estimated_cost > self.max_cost_per_user:
                return {
                    "allowed": False,
                    "current_cost": current_cost,
                    "max_cost": self.max_cost_per_user,
                    "remaining": self.max_cost_per_user - current_cost
                }
            
            return {
                "allowed": True,
                "current_cost": current_cost,
                "remaining": self.max_cost_per_user - current_cost - estimated_cost
            }
    
    def record_cost(self, user_id: str, cost: float):
        """Record actual cost"""
        with self.lock:
            self.costs[user_id] += cost
    
    def reset_user_cost(self, user_id: str):
        """Reset user's cost (e.g., monthly)"""
        with self.lock:
            self.costs[user_id] = 0.0

Failure Modes and Fallbacks

Handle failures gracefully.

Retry Logic

import time
from functools import wraps

def retry_with_backoff(max_retries: int = 3, base_delay: float = 1.0):
    """Decorator for retry with exponential backoff"""
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            for attempt in range(max_retries):
                try:
                    return func(*args, **kwargs)
                except Exception as e:
                    if attempt == max_retries - 1:
                        raise
                    
                    delay = base_delay * (2 ** attempt)
                    print(f"Attempt {attempt + 1} failed: {e}. Retrying in {delay}s...")
                    time.sleep(delay)
        
        return wrapper
    return decorator

# Usage
@retry_with_backoff(max_retries=3, base_delay=1.0)
def call_api(prompt: str) -> str:
    """API call with retry"""
    response = client.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": prompt}]
    )
    return response.choices[0].message.content

Circuit Breaker

class CircuitBreaker:
    """Circuit breaker pattern for API calls"""
    
    def __init__(self, failure_threshold: int = 5, timeout: int = 60):
        self.failure_threshold = failure_threshold
        self.timeout = timeout
        self.failures = 0
        self.last_failure_time = None
        self.state = 'closed'  # closed, open, half-open
    
    def call(self, func, *args, **kwargs):
        """Execute function with circuit breaker"""
        if self.state == 'open':
            # Check if timeout has passed
            if time.time() - self.last_failure_time > self.timeout:
                self.state = 'half-open'
            else:
                raise Exception("Circuit breaker is OPEN")
        
        try:
            result = func(*args, **kwargs)
            
            # Success - reset if in half-open
            if self.state == 'half-open':
                self.state = 'closed'
                self.failures = 0
            
            return result
            
        except Exception as e:
            self.failures += 1
            self.last_failure_time = time.time()
            
            if self.failures >= self.failure_threshold:
                self.state = 'open'
            
            raise e

Fallback Strategies

class FallbackAgent:
    """Agent with fallback strategies"""
    
    def __init__(self):
        self.primary_model = "gpt-4"
        self.fallback_model = "gpt-3.5-turbo"
        self.client = openai.OpenAI()
    
    def generate_with_fallback(self, prompt: str) -> dict:
        """Try primary model, fallback to cheaper model if fails"""
        try:
            response = self.client.chat.completions.create(
                model=self.primary_model,
                messages=[{"role": "user", "content": prompt}],
                timeout=30
            )
            
            return {
                "success": True,
                "response": response.choices[0].message.content,
                "model": self.primary_model
            }
            
        except Exception as e:
            print(f"Primary model failed: {e}. Trying fallback...")
            
            try:
                response = self.client.chat.completions.create(
                    model=self.fallback_model,
                    messages=[{"role": "user", "content": prompt}],
                    timeout=30
                )
                
                return {
                    "success": True,
                    "response": response.choices[0].message.content,
                    "model": self.fallback_model,
                    "fallback": True
                }
                
            except Exception as e2:
                return {
                    "success": False,
                    "error": str(e2)
                }
    
    def execute_with_fallback(self, task: str, strategies: list) -> dict:
        """Try multiple strategies in order"""
        for i, strategy in enumerate(strategies):
            try:
                result = strategy(task)
                return {
                    "success": True,
                    "result": result,
                    "strategy": i
                }
            except Exception as e:
                if i == len(strategies) - 1:
                    return {
                        "success": False,
                        "error": f"All strategies failed. Last error: {e}"
                    }
                continue

Complete Safe Agent

class SafeAgent:
    """Production-ready agent with safety features"""
    
    def __init__(self, user_id: str):
        self.user_id = user_id
        self.validator = InputValidator()
        self.guardrails = OutputGuardrails()
        self.rate_limiter = RateLimiter()
        self.cost_tracker = CostTracker()
        self.circuit_breaker = CircuitBreaker()
        self.client = openai.OpenAI()
    
    def process(self, user_input: str) -> dict:
        """Process user input safely"""
        
        # 1. Validate input
        validation = self.validator.validate_text_input(user_input)
        if not validation['valid']:
            return {
                "success": False,
                "error": "Invalid input",
                "details": validation['errors']
            }
        
        # 2. Check rate limit
        rate_check = self.rate_limiter.check_rate_limit(self.user_id)
        if not rate_check['allowed']:
            return {
                "success": False,
                "error": "Rate limit exceeded",
                "reset_in": rate_check['reset_in']
            }
        
        # 3. Sanitize input
        clean_input = self.validator.sanitize_text(user_input)
        
        # 4. Estimate cost
        estimated_tokens = len(clean_input.split()) * 1.3  # Rough estimate
        estimated_cost = self.cost_tracker.estimate_cost(
            'gpt-4',
            int(estimated_tokens),
            500  # Estimated output
        )
        
        # 5. Check budget
        budget_check = self.cost_tracker.check_budget(self.user_id, estimated_cost)
        if not budget_check['allowed']:
            return {
                "success": False,
                "error": "Budget exceeded",
                "remaining": budget_check['remaining']
            }
        
        # 6. Generate response with circuit breaker
        try:
            response = self.circuit_breaker.call(
                self._generate_response,
                clean_input
            )
        except Exception as e:
            return {
                "success": False,
                "error": f"Generation failed: {str(e)}"
            }
        
        # 7. Validate output
        filtered = self.guardrails.filter_output(response)
        
        if not filtered['safe']:
            return {
                "success": False,
                "error": "Output failed safety check",
                "issues": filtered['issues']
            }
        
        # 8. Record actual cost
        self.cost_tracker.record_cost(self.user_id, estimated_cost)
        
        return {
            "success": True,
            "response": filtered['text'],
            "cost": estimated_cost,
            "remaining_budget": budget_check['remaining'] - estimated_cost
        }
    
    @retry_with_backoff(max_retries=3)
    def _generate_response(self, prompt: str) -> str:
        """Generate response with retry"""
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[
                {
                    "role": "system",
                    "content": "You are a helpful assistant. Never share personal information or harmful content."
                },
                {"role": "user", "content": prompt}
            ],
            timeout=30
        )
        
        return response.choices[0].message.content

# Usage
agent = SafeAgent(user_id="user123")
result = agent.process("What is the capital of France?")

if result['success']:
    print(result['response'])
else:
    print(f"Error: {result['error']}")

Best Practices

Validate everything: Never trust input
Sanitize data: Clean before processing
Rate limit: Prevent abuse
Track costs: Monitor spending
Filter outputs: Check for harmful content
Implement retries: Handle transient failures
Use circuit breakers: Prevent cascading failures
Have fallbacks: Multiple strategies
Log everything: Track for debugging
Test failure modes: Ensure graceful degradation

Next Steps

You now understand reliability and safety! Next, we’ll explore evaluation and testing to ensure your agents work correctly.

Evaluation & Testing

Agent Benchmarks

Measure agent performance systematically.

Creating Test Suites

from dataclasses import dataclass
from typing import List, Callable
import time

@dataclass
class TestCase:
    """Single test case"""
    name: str
    input: str
    expected_output: str = None
    expected_behavior: str = None
    timeout: int = 30

@dataclass
class TestResult:
    """Test result"""
    test_name: str
    passed: bool
    actual_output: str
    expected_output: str
    execution_time: float
    error: str = None

class AgentTestSuite:
    """Test suite for agents"""
    
    def __init__(self, agent):
        self.agent = agent
        self.test_cases = []
        self.results = []
    
    def add_test(self, test_case: TestCase):
        """Add test case"""
        self.test_cases.append(test_case)
    
    def run_tests(self) -> dict:
        """Run all tests"""
        self.results = []
        
        for test in self.test_cases:
            print(f"Running: {test.name}...")
            result = self.run_single_test(test)
            self.results.append(result)
        
        return self.generate_report()
    
    def run_single_test(self, test: TestCase) -> TestResult:
        """Run single test"""
        start_time = time.time()
        
        try:
            # Execute agent
            actual_output = self.agent.process(test.input)
            execution_time = time.time() - start_time
            
            # Check result
            if test.expected_output:
                passed = self.check_output_match(actual_output, test.expected_output)
            elif test.expected_behavior:
                passed = self.check_behavior(actual_output, test.expected_behavior)
            else:
                passed = True  # Just check it doesn't crash
            
            return TestResult(
                test_name=test.name,
                passed=passed,
                actual_output=actual_output,
                expected_output=test.expected_output or test.expected_behavior,
                execution_time=execution_time
            )
            
        except Exception as e:
            execution_time = time.time() - start_time
            return TestResult(
                test_name=test.name,
                passed=False,
                actual_output="",
                expected_output=test.expected_output or test.expected_behavior,
                execution_time=execution_time,
                error=str(e)
            )
    
    def check_output_match(self, actual: str, expected: str) -> bool:
        """Check if output matches expected"""
        # Exact match
        if actual.strip() == expected.strip():
            return True
        
        # Contains expected
        if expected.lower() in actual.lower():
            return True
        
        return False
    
    def check_behavior(self, output: str, behavior: str) -> bool:
        """Check if output exhibits expected behavior"""
        # Use LLM to judge
        prompt = f"""Does this output exhibit the expected behavior?

Output: {output}

Expected behavior: {behavior}

Answer with just 'yes' or 'no':"""
        
        response = llm.generate(prompt).strip().lower()
        return response == 'yes'
    
    def generate_report(self) -> dict:
        """Generate test report"""
        total = len(self.results)
        passed = sum(1 for r in self.results if r.passed)
        failed = total - passed
        
        avg_time = sum(r.execution_time for r in self.results) / total if total > 0 else 0
        
        return {
            "total": total,
            "passed": passed,
            "failed": failed,
            "pass_rate": passed / total if total > 0 else 0,
            "avg_execution_time": avg_time,
            "results": self.results
        }

# Usage
suite = AgentTestSuite(agent)

suite.add_test(TestCase(
    name="Basic math",
    input="What is 2 + 2?",
    expected_output="4"
))

suite.add_test(TestCase(
    name="Tool usage",
    input="Search for information about Python",
    expected_behavior="Uses search tool and provides relevant information"
))

report = suite.run_tests()
print(f"Pass rate: {report['pass_rate']:.1%}")

Standard Benchmarks

class StandardBenchmarks:
    """Common agent benchmarks"""
    
    @staticmethod
    def get_math_benchmark() -> List[TestCase]:
        """Math reasoning tests"""
        return [
            TestCase("Addition", "What is 123 + 456?", "579"),
            TestCase("Multiplication", "What is 25 * 17?", "425"),
            TestCase("Word problem", "If I have 3 apples and buy 2 more, how many do I have?", "5"),
            TestCase("Percentage", "What is 15% of 200?", "30"),
        ]
    
    @staticmethod
    def get_reasoning_benchmark() -> List[TestCase]:
        """Logical reasoning tests"""
        return [
            TestCase(
                "Deduction",
                "All cats are animals. Fluffy is a cat. Is Fluffy an animal?",
                expected_behavior="Correctly deduces that Fluffy is an animal"
            ),
            TestCase(
                "Planning",
                "I need to make dinner. What steps should I take?",
                expected_behavior="Provides logical sequence of steps"
            ),
        ]
    
    @staticmethod
    def get_tool_usage_benchmark() -> List[TestCase]:
        """Tool usage tests"""
        return [
            TestCase(
                "Search",
                "Find information about the Eiffel Tower",
                expected_behavior="Uses search tool and provides facts"
            ),
            TestCase(
                "Calculation",
                "Calculate the compound interest on $1000 at 5% for 3 years",
                expected_behavior="Uses calculator tool"
            ),
        ]

Success Metrics

Define what success means for your agent.

Quantitative Metrics

class AgentMetrics:
    """Track agent performance metrics"""
    
    def __init__(self):
        self.metrics = {
            "total_requests": 0,
            "successful_requests": 0,
            "failed_requests": 0,
            "total_execution_time": 0,
            "tool_calls": 0,
            "tokens_used": 0,
            "cost": 0.0
        }
    
    def record_request(self, 
                      success: bool,
                      execution_time: float,
                      tool_calls: int = 0,
                      tokens: int = 0,
                      cost: float = 0.0):
        """Record request metrics"""
        self.metrics["total_requests"] += 1
        
        if success:
            self.metrics["successful_requests"] += 1
        else:
            self.metrics["failed_requests"] += 1
        
        self.metrics["total_execution_time"] += execution_time
        self.metrics["tool_calls"] += tool_calls
        self.metrics["tokens_used"] += tokens
        self.metrics["cost"] += cost
    
    def get_summary(self) -> dict:
        """Get metrics summary"""
        total = self.metrics["total_requests"]
        
        if total == 0:
            return self.metrics
        
        return {
            **self.metrics,
            "success_rate": self.metrics["successful_requests"] / total,
            "avg_execution_time": self.metrics["total_execution_time"] / total,
            "avg_tool_calls": self.metrics["tool_calls"] / total,
            "avg_tokens": self.metrics["tokens_used"] / total,
            "avg_cost": self.metrics["cost"] / total
        }
    
    def print_summary(self):
        """Print formatted summary"""
        summary = self.get_summary()
        
        print("Agent Performance Metrics")
        print("=" * 40)
        print(f"Total Requests: {summary['total_requests']}")
        print(f"Success Rate: {summary['success_rate']:.1%}")
        print(f"Avg Execution Time: {summary['avg_execution_time']:.2f}s")
        print(f"Avg Tool Calls: {summary['avg_tool_calls']:.1f}")
        print(f"Avg Tokens: {summary['avg_tokens']:.0f}")
        print(f"Avg Cost: ${summary['avg_cost']:.4f}")
        print(f"Total Cost: ${summary['cost']:.2f}")

Qualitative Metrics

class QualityEvaluator:
    """Evaluate response quality"""
    
    def __init__(self):
        self.client = openai.OpenAI()
    
    def evaluate_response(self, 
                         question: str,
                         response: str,
                         criteria: List[str] = None) -> dict:
        """Evaluate response quality"""
        
        if criteria is None:
            criteria = [
                "Accuracy: Is the information correct?",
                "Completeness: Does it fully answer the question?",
                "Clarity: Is it easy to understand?",
                "Relevance: Does it stay on topic?"
            ]
        
        scores = {}
        
        for criterion in criteria:
            score = self.score_criterion(question, response, criterion)
            criterion_name = criterion.split(':')[0]
            scores[criterion_name] = score
        
        return {
            "scores": scores,
            "average": sum(scores.values()) / len(scores),
            "passed": all(score >= 3 for score in scores.values())
        }
    
    def score_criterion(self, question: str, response: str, criterion: str) -> int:
        """Score response on single criterion (1-5)"""
        prompt = f"""Rate this response on the following criterion (1-5):

Question: {question}

Response: {response}

Criterion: {criterion}

Provide only a number from 1 (poor) to 5 (excellent):"""
        
        result = self.client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.1
        )
        
        try:
            score = int(result.choices[0].message.content.strip())
            return max(1, min(5, score))  # Clamp to 1-5
        except:
            return 3  # Default to middle score

Unit and Integration Testing

Unit Tests for Components

import unittest

class TestAgentComponents(unittest.TestCase):
    """Unit tests for agent components"""
    
    def setUp(self):
        """Set up test fixtures"""
        self.agent = MyAgent()
    
    def test_input_validation(self):
        """Test input validation"""
        validator = InputValidator()
        
        # Valid input
        result = validator.validate_text_input("Hello world")
        self.assertTrue(result['valid'])
        
        # Invalid input (too long)
        long_text = "x" * 20000
        result = validator.validate_text_input(long_text)
        self.assertFalse(result['valid'])
    
    def test_tool_execution(self):
        """Test tool execution"""
        result = self.agent.execute_tool("calculate", {"expression": "2 + 2"})
        self.assertEqual(result, "4")
    
    def test_memory_storage(self):
        """Test memory system"""
        self.agent.memory.add("user_name", "Alice")
        retrieved = self.agent.memory.get("user_name")
        self.assertEqual(retrieved, "Alice")
    
    def test_error_handling(self):
        """Test error handling"""
        # Should not crash on invalid tool
        result = self.agent.execute_tool("nonexistent_tool", {})
        self.assertIn("error", result.lower())
    
    def tearDown(self):
        """Clean up"""
        pass

# Run tests
if __name__ == '__main__':
    unittest.main()

Integration Tests

class TestAgentIntegration(unittest.TestCase):
    """Integration tests for full agent"""
    
    def test_end_to_end_query(self):
        """Test complete query flow"""
        agent = MyAgent()
        
        response = agent.process("What is 2 + 2?")
        
        self.assertIsNotNone(response)
        self.assertIn("4", response)
    
    def test_multi_step_task(self):
        """Test multi-step task execution"""
        agent = MyAgent()
        
        response = agent.process("Search for Python tutorials and summarize the top result")
        
        # Should use search tool
        self.assertTrue(agent.tool_used("search"))
        
        # Should provide summary
        self.assertGreater(len(response), 50)
    
    def test_error_recovery(self):
        """Test error recovery"""
        agent = MyAgent()
        
        # Simulate tool failure
        agent.tools["search"] = lambda x: raise_error()
        
        response = agent.process("Search for something")
        
        # Should handle gracefully
        self.assertIsNotNone(response)
        self.assertNotIn("Traceback", response)
    
    def test_rate_limiting(self):
        """Test rate limiting"""
        agent = MyAgent()
        
        # Make many requests
        for i in range(150):
            response = agent.process(f"Request {i}")
        
        # Should be rate limited
        self.assertTrue(agent.was_rate_limited())

Property-Based Testing

from hypothesis import given, strategies as st

class TestAgentProperties(unittest.TestCase):
    """Property-based tests"""
    
    @given(st.text(min_size=1, max_size=1000))
    def test_agent_handles_any_text(self, text):
        """Agent should handle any text input without crashing"""
        agent = MyAgent()
        
        try:
            response = agent.process(text)
            # Should return something
            self.assertIsNotNone(response)
        except Exception as e:
            # Should not crash
            self.fail(f"Agent crashed on input: {text[:50]}... Error: {e}")
    
    @given(st.integers(min_value=-1000, max_value=1000))
    def test_calculator_tool(self, number):
        """Calculator should handle any integer"""
        agent = MyAgent()
        
        result = agent.execute_tool("calculate", {"expression": f"{number} + 1"})
        expected = str(number + 1)
        
        self.assertEqual(result, expected)

Human Evaluation Frameworks

Collecting Human Feedback

class HumanEvaluator:
    """Collect human evaluations"""
    
    def __init__(self):
        self.evaluations = []
    
    def request_evaluation(self, 
                          question: str,
                          response: str,
                          evaluator_id: str) -> dict:
        """Request human evaluation"""
        
        print(f"\n{'='*60}")
        print(f"Question: {question}")
        print(f"\nResponse: {response}")
        print(f"\n{'='*60}")
        
        # Collect ratings
        ratings = {}
        
        criteria = [
            ("accuracy", "Is the response accurate? (1-5)"),
            ("helpfulness", "Is the response helpful? (1-5)"),
            ("clarity", "Is the response clear? (1-5)"),
        ]
        
        for key, prompt in criteria:
            while True:
                try:
                    score = int(input(f"{prompt}: "))
                    if 1 <= score <= 5:
                        ratings[key] = score
                        break
                except ValueError:
                    pass
        
        # Collect feedback
        feedback = input("\nAdditional feedback (optional): ")
        
        evaluation = {
            "question": question,
            "response": response,
            "evaluator_id": evaluator_id,
            "ratings": ratings,
            "feedback": feedback,
            "timestamp": time.time()
        }
        
        self.evaluations.append(evaluation)
        return evaluation
    
    def get_summary(self) -> dict:
        """Get evaluation summary"""
        if not self.evaluations:
            return {}
        
        # Average ratings
        avg_ratings = {}
        for criterion in ["accuracy", "helpfulness", "clarity"]:
            scores = [e["ratings"][criterion] for e in self.evaluations]
            avg_ratings[criterion] = sum(scores) / len(scores)
        
        return {
            "total_evaluations": len(self.evaluations),
            "average_ratings": avg_ratings,
            "overall_score": sum(avg_ratings.values()) / len(avg_ratings)
        }

A/B Testing

class ABTest:
    """A/B test different agent versions"""
    
    def __init__(self, agent_a, agent_b):
        self.agent_a = agent_a
        self.agent_b = agent_b
        self.results = {"a": [], "b": []}
    
    def run_test(self, test_cases: List[str], evaluator) -> dict:
        """Run A/B test"""
        
        for i, test_case in enumerate(test_cases):
            # Alternate between agents
            if i % 2 == 0:
                agent = self.agent_a
                variant = "a"
            else:
                agent = self.agent_b
                variant = "b"
            
            # Get response
            response = agent.process(test_case)
            
            # Evaluate
            evaluation = evaluator.evaluate_response(test_case, response)
            
            self.results[variant].append(evaluation)
        
        return self.compare_results()
    
    def compare_results(self) -> dict:
        """Compare A vs B"""
        avg_a = sum(r["average"] for r in self.results["a"]) / len(self.results["a"])
        avg_b = sum(r["average"] for r in self.results["b"]) / len(self.results["b"])
        
        return {
            "agent_a_score": avg_a,
            "agent_b_score": avg_b,
            "winner": "a" if avg_a > avg_b else "b",
            "difference": abs(avg_a - avg_b)
        }

Automated Testing Pipeline

class TestPipeline:
    """Automated testing pipeline"""
    
    def __init__(self, agent):
        self.agent = agent
        self.test_suite = AgentTestSuite(agent)
        self.metrics = AgentMetrics()
        self.evaluator = QualityEvaluator()
    
    def run_full_pipeline(self) -> dict:
        """Run complete test pipeline"""
        results = {}
        
        # 1. Unit tests
        print("Running unit tests...")
        results["unit_tests"] = self.run_unit_tests()
        
        # 2. Integration tests
        print("Running integration tests...")
        results["integration_tests"] = self.run_integration_tests()
        
        # 3. Benchmark tests
        print("Running benchmarks...")
        results["benchmarks"] = self.run_benchmarks()
        
        # 4. Quality evaluation
        print("Running quality evaluation...")
        results["quality"] = self.run_quality_evaluation()
        
        # 5. Performance metrics
        print("Collecting performance metrics...")
        results["performance"] = self.metrics.get_summary()
        
        # 6. Generate report
        report = self.generate_report(results)
        
        return report
    
    def run_unit_tests(self) -> dict:
        """Run unit tests"""
        loader = unittest.TestLoader()
        suite = loader.loadTestsFromTestCase(TestAgentComponents)
        runner = unittest.TextTestRunner(verbosity=0)
        result = runner.run(suite)
        
        return {
            "total": result.testsRun,
            "passed": result.testsRun - len(result.failures) - len(result.errors),
            "failed": len(result.failures) + len(result.errors)
        }
    
    def run_integration_tests(self) -> dict:
        """Run integration tests"""
        loader = unittest.TestLoader()
        suite = loader.loadTestsFromTestCase(TestAgentIntegration)
        runner = unittest.TextTestRunner(verbosity=0)
        result = runner.run(suite)
        
        return {
            "total": result.testsRun,
            "passed": result.testsRun - len(result.failures) - len(result.errors),
            "failed": len(result.failures) + len(result.errors)
        }
    
    def run_benchmarks(self) -> dict:
        """Run benchmark tests"""
        # Add standard benchmarks
        for test in StandardBenchmarks.get_math_benchmark():
            self.test_suite.add_test(test)
        
        for test in StandardBenchmarks.get_reasoning_benchmark():
            self.test_suite.add_test(test)
        
        return self.test_suite.run_tests()
    
    def run_quality_evaluation(self) -> dict:
        """Run quality evaluation"""
        test_cases = [
            ("What is Python?", "Python is a high-level programming language..."),
            ("How do I sort a list?", "You can use the sorted() function..."),
        ]
        
        evaluations = []
        for question, response in test_cases:
            eval_result = self.evaluator.evaluate_response(question, response)
            evaluations.append(eval_result)
        
        avg_score = sum(e["average"] for e in evaluations) / len(evaluations)
        
        return {
            "evaluations": evaluations,
            "average_score": avg_score
        }
    
    def generate_report(self, results: dict) -> dict:
        """Generate comprehensive report"""
        return {
            "timestamp": time.time(),
            "summary": {
                "unit_tests_passed": results["unit_tests"]["passed"],
                "integration_tests_passed": results["integration_tests"]["passed"],
                "benchmark_pass_rate": results["benchmarks"]["pass_rate"],
                "quality_score": results["quality"]["average_score"],
                "success_rate": results["performance"]["success_rate"]
            },
            "details": results
        }

# Usage
pipeline = TestPipeline(agent)
report = pipeline.run_full_pipeline()

print("\nTest Report Summary")
print("=" * 40)
for key, value in report["summary"].items():
    print(f"{key}: {value}")

Best Practices

Test early and often: Continuous testing during development
Automate testing: Run tests automatically on changes
Use multiple metrics: Quantitative and qualitative
Test edge cases: Unusual inputs, errors, limits
Benchmark regularly: Track performance over time
Get human feedback: Automated tests aren’t enough
Test in production: Monitor real usage
Version your tests: Track test changes
Document failures: Learn from what breaks
Iterate based on results: Use tests to improve

Next Steps

You now understand evaluation and testing! Next, we’ll explore monitoring and observability for production agents.

Monitoring & Observability

Logging and Tracing

Track what your agent is doing at every step.

Structured Logging

import logging
import json
from datetime import datetime
from typing import Any, Dict

class AgentLogger:
    """Structured logging for agents"""
    
    def __init__(self, agent_id: str, log_file: str = "agent.log"):
        self.agent_id = agent_id
        self.logger = logging.getLogger(agent_id)
        self.logger.setLevel(logging.INFO)
        
        # File handler
        handler = logging.FileHandler(log_file)
        handler.setFormatter(logging.Formatter('%(message)s'))
        self.logger.addHandler(handler)
        
        # Console handler
        console = logging.StreamHandler()
        console.setFormatter(logging.Formatter('%(levelname)s: %(message)s'))
        self.logger.addHandler(console)
    
    def log_event(self, 
                  event_type: str,
                  data: Dict[str, Any],
                  level: str = "info"):
        """Log structured event"""
        log_entry = {
            "timestamp": datetime.utcnow().isoformat(),
            "agent_id": self.agent_id,
            "event_type": event_type,
            "data": data
        }
        
        log_message = json.dumps(log_entry)
        
        if level == "info":
            self.logger.info(log_message)
        elif level == "warning":
            self.logger.warning(log_message)
        elif level == "error":
            self.logger.error(log_message)
        elif level == "debug":
            self.logger.debug(log_message)
    
    def log_request(self, user_id: str, input_text: str):
        """Log incoming request"""
        self.log_event("request", {
            "user_id": user_id,
            "input": input_text[:200],  # Truncate long inputs
            "input_length": len(input_text)
        })
    
    def log_response(self, user_id: str, output_text: str, execution_time: float):
        """Log response"""
        self.log_event("response", {
            "user_id": user_id,
            "output": output_text[:200],
            "output_length": len(output_text),
            "execution_time": execution_time
        })
    
    def log_tool_call(self, tool_name: str, parameters: dict, result: Any):
        """Log tool execution"""
        self.log_event("tool_call", {
            "tool": tool_name,
            "parameters": parameters,
            "result": str(result)[:200],
            "success": result is not None
        })
    
    def log_error(self, error_type: str, error_message: str, context: dict = None):
        """Log error"""
        self.log_event("error", {
            "error_type": error_type,
            "message": error_message,
            "context": context or {}
        }, level="error")

# Usage
logger = AgentLogger("agent-001")
logger.log_request("user123", "What is the weather?")
logger.log_tool_call("weather_api", {"location": "NYC"}, {"temp": 72})
logger.log_response("user123", "It's 72°F in NYC", 1.5)

Distributed Tracing

import uuid
from contextlib import contextmanager
from typing import Optional

class Tracer:
    """Distributed tracing for agent operations"""
    
    def __init__(self):
        self.traces = {}
        self.current_trace = None
    
    @contextmanager
    def trace(self, operation_name: str, parent_id: Optional[str] = None):
        """Create trace span"""
        span_id = str(uuid.uuid4())
        trace_id = parent_id or str(uuid.uuid4())
        
        span = {
            "span_id": span_id,
            "trace_id": trace_id,
            "operation": operation_name,
            "start_time": time.time(),
            "parent_id": parent_id,
            "children": [],
            "metadata": {}
        }
        
        # Store current trace
        previous_trace = self.current_trace
        self.current_trace = span_id
        self.traces[span_id] = span
        
        try:
            yield span
        finally:
            # End span
            span["end_time"] = time.time()
            span["duration"] = span["end_time"] - span["start_time"]
            
            # Restore previous trace
            self.current_trace = previous_trace
    
    def add_metadata(self, key: str, value: Any):
        """Add metadata to current span"""
        if self.current_trace:
            self.traces[self.current_trace]["metadata"][key] = value
    
    def get_trace(self, trace_id: str) -> dict:
        """Get full trace"""
        spans = [s for s in self.traces.values() if s["trace_id"] == trace_id]
        
        # Build tree
        root = [s for s in spans if s["parent_id"] is None][0]
        self._build_tree(root, spans)
        
        return root
    
    def _build_tree(self, node: dict, all_spans: list):
        """Build trace tree"""
        children = [s for s in all_spans if s["parent_id"] == node["span_id"]]
        node["children"] = children
        
        for child in children:
            self._build_tree(child, all_spans)

# Usage
tracer = Tracer()

with tracer.trace("agent_request") as trace:
    tracer.add_metadata("user_id", "user123")
    
    with tracer.trace("tool_call", parent_id=trace["span_id"]):
        tracer.add_metadata("tool", "search")
        # Execute tool
        pass
    
    with tracer.trace("generate_response", parent_id=trace["span_id"]):
        # Generate response
        pass

# View trace
full_trace = tracer.get_trace(trace["trace_id"])

Performance Metrics

Track agent performance in real-time.

Metrics Collector

from collections import defaultdict
from threading import Lock
import time

class MetricsCollector:
    """Collect and aggregate metrics"""
    
    def __init__(self):
        self.metrics = defaultdict(list)
        self.counters = defaultdict(int)
        self.lock = Lock()
    
    def record_metric(self, name: str, value: float, tags: dict = None):
        """Record a metric value"""
        with self.lock:
            self.metrics[name].append({
                "value": value,
                "timestamp": time.time(),
                "tags": tags or {}
            })
    
    def increment_counter(self, name: str, amount: int = 1):
        """Increment counter"""
        with self.lock:
            self.counters[name] += amount
    
    def get_stats(self, name: str, window_seconds: int = 3600) -> dict:
        """Get statistics for metric"""
        with self.lock:
            current_time = time.time()
            
            # Filter to time window
            values = [
                m["value"] for m in self.metrics[name]
                if current_time - m["timestamp"] < window_seconds
            ]
            
            if not values:
                return {}
            
            return {
                "count": len(values),
                "min": min(values),
                "max": max(values),
                "avg": sum(values) / len(values),
                "p50": self._percentile(values, 50),
                "p95": self._percentile(values, 95),
                "p99": self._percentile(values, 99)
            }
    
    def _percentile(self, values: list, percentile: int) -> float:
        """Calculate percentile"""
        sorted_values = sorted(values)
        index = int(len(sorted_values) * percentile / 100)
        return sorted_values[min(index, len(sorted_values) - 1)]
    
    def get_counter(self, name: str) -> int:
        """Get counter value"""
        with self.lock:
            return self.counters[name]
    
    def reset(self):
        """Reset all metrics"""
        with self.lock:
            self.metrics.clear()
            self.counters.clear()

# Usage
metrics = MetricsCollector()

# Record metrics
metrics.record_metric("response_time", 1.5, {"user": "user123"})
metrics.record_metric("response_time", 2.1, {"user": "user456"})
metrics.increment_counter("total_requests")
metrics.increment_counter("successful_requests")

# Get stats
stats = metrics.get_stats("response_time")
print(f"Avg response time: {stats['avg']:.2f}s")
print(f"P95 response time: {stats['p95']:.2f}s")

Real-Time Dashboard

class MetricsDashboard:
    """Real-time metrics dashboard"""
    
    def __init__(self, metrics_collector: MetricsCollector):
        self.metrics = metrics_collector
    
    def display(self):
        """Display current metrics"""
        print("\n" + "="*60)
        print("AGENT METRICS DASHBOARD")
        print("="*60)
        
        # Request metrics
        total = self.metrics.get_counter("total_requests")
        successful = self.metrics.get_counter("successful_requests")
        failed = self.metrics.get_counter("failed_requests")
        
        print(f"\nRequests:")
        print(f"  Total: {total}")
        print(f"  Successful: {successful}")
        print(f"  Failed: {failed}")
        if total > 0:
            print(f"  Success Rate: {successful/total:.1%}")
        
        # Response time
        response_stats = self.metrics.get_stats("response_time")
        if response_stats:
            print(f"\nResponse Time:")
            print(f"  Average: {response_stats['avg']:.2f}s")
            print(f"  P50: {response_stats['p50']:.2f}s")
            print(f"  P95: {response_stats['p95']:.2f}s")
            print(f"  P99: {response_stats['p99']:.2f}s")
        
        # Tool usage
        tool_calls = self.metrics.get_counter("tool_calls")
        print(f"\nTool Calls: {tool_calls}")
        
        # Cost
        total_cost = self.metrics.get_counter("total_cost_cents") / 100
        print(f"\nTotal Cost: ${total_cost:.2f}")
        
        print("="*60 + "\n")

Cost Tracking

Monitor spending in real-time.

Cost Monitor

class CostMonitor:
    """Monitor and alert on costs"""
    
    def __init__(self, budget_limit: float = 100.0):
        self.budget_limit = budget_limit
        self.costs = defaultdict(float)
        self.lock = Lock()
        self.alerts = []
    
    def record_cost(self, 
                   user_id: str,
                   cost: float,
                   model: str,
                   tokens: int):
        """Record cost"""
        with self.lock:
            self.costs[user_id] += cost
            
            # Check for alerts
            if self.costs[user_id] > self.budget_limit * 0.8:
                self.add_alert(
                    "warning",
                    f"User {user_id} at 80% of budget: ${self.costs[user_id]:.2f}"
                )
            
            if self.costs[user_id] > self.budget_limit:
                self.add_alert(
                    "critical",
                    f"User {user_id} exceeded budget: ${self.costs[user_id]:.2f}"
                )
    
    def add_alert(self, level: str, message: str):
        """Add alert"""
        alert = {
            "level": level,
            "message": message,
            "timestamp": time.time()
        }
        self.alerts.append(alert)
        
        # Log alert
        if level == "critical":
            logger.log_event("cost_alert", alert, level="error")
        else:
            logger.log_event("cost_alert", alert, level="warning")
    
    def get_user_cost(self, user_id: str) -> dict:
        """Get user's cost"""
        with self.lock:
            cost = self.costs[user_id]
            return {
                "cost": cost,
                "budget": self.budget_limit,
                "remaining": self.budget_limit - cost,
                "percentage": (cost / self.budget_limit) * 100
            }
    
    def get_total_cost(self) -> float:
        """Get total cost across all users"""
        with self.lock:
            return sum(self.costs.values())
    
    def get_alerts(self, level: str = None) -> list:
        """Get alerts"""
        if level:
            return [a for a in self.alerts if a["level"] == level]
        return self.alerts

User Feedback Loops

Collect and act on user feedback.

Feedback Collector

class FeedbackCollector:
    """Collect user feedback"""
    
    def __init__(self):
        self.feedback = []
        self.ratings = defaultdict(list)
    
    def collect_rating(self, 
                      user_id: str,
                      interaction_id: str,
                      rating: int,
                      comment: str = ""):
        """Collect user rating (1-5)"""
        feedback = {
            "user_id": user_id,
            "interaction_id": interaction_id,
            "rating": rating,
            "comment": comment,
            "timestamp": time.time()
        }
        
        self.feedback.append(feedback)
        self.ratings[user_id].append(rating)
        
        # Log feedback
        logger.log_event("user_feedback", feedback)
        
        # Alert on low ratings
        if rating <= 2:
            logger.log_event("low_rating", feedback, level="warning")
    
    def get_average_rating(self, user_id: str = None) -> float:
        """Get average rating"""
        if user_id:
            ratings = self.ratings[user_id]
        else:
            ratings = [f["rating"] for f in self.feedback]
        
        if not ratings:
            return 0.0
        
        return sum(ratings) / len(ratings)
    
    def get_recent_feedback(self, limit: int = 10) -> list:
        """Get recent feedback"""
        return sorted(
            self.feedback,
            key=lambda x: x["timestamp"],
            reverse=True
        )[:limit]
    
    def get_low_ratings(self, threshold: int = 2) -> list:
        """Get low-rated interactions"""
        return [
            f for f in self.feedback
            if f["rating"] <= threshold
        ]

Feedback Analysis

class FeedbackAnalyzer:
    """Analyze feedback patterns"""
    
    def __init__(self, feedback_collector: FeedbackCollector):
        self.collector = feedback_collector
        self.client = openai.OpenAI()
    
    def analyze_trends(self) -> dict:
        """Analyze feedback trends"""
        recent = self.collector.get_recent_feedback(limit=100)
        
        if not recent:
            return {}
        
        # Calculate trends
        ratings = [f["rating"] for f in recent]
        
        return {
            "average_rating": sum(ratings) / len(ratings),
            "total_feedback": len(recent),
            "rating_distribution": {
                "5_star": sum(1 for r in ratings if r == 5),
                "4_star": sum(1 for r in ratings if r == 4),
                "3_star": sum(1 for r in ratings if r == 3),
                "2_star": sum(1 for r in ratings if r == 2),
                "1_star": sum(1 for r in ratings if r == 1),
            }
        }
    
    def identify_issues(self) -> list:
        """Identify common issues from feedback"""
        low_ratings = self.collector.get_low_ratings()
        
        if not low_ratings:
            return []
        
        # Extract comments
        comments = [f["comment"] for f in low_ratings if f["comment"]]
        
        if not comments:
            return []
        
        # Use LLM to identify themes
        prompt = f"""Analyze these negative feedback comments and identify common themes:

{chr(10).join(comments[:20])}

List the top 3 issues:"""
        
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}]
        )
        
        return response.choices[0].message.content.split('\n')

Complete Monitoring System

class AgentMonitor:
    """Complete monitoring system"""
    
    def __init__(self, agent_id: str):
        self.agent_id = agent_id
        self.logger = AgentLogger(agent_id)
        self.tracer = Tracer()
        self.metrics = MetricsCollector()
        self.cost_monitor = CostMonitor()
        self.feedback = FeedbackCollector()
    
    def monitor_request(self, user_id: str, input_text: str):
        """Monitor incoming request"""
        self.logger.log_request(user_id, input_text)
        self.metrics.increment_counter("total_requests")
        
        return {
            "trace_id": str(uuid.uuid4()),
            "start_time": time.time()
        }
    
    def monitor_response(self, 
                        user_id: str,
                        output_text: str,
                        context: dict):
        """Monitor response"""
        execution_time = time.time() - context["start_time"]
        
        self.logger.log_response(user_id, output_text, execution_time)
        self.metrics.record_metric("response_time", execution_time)
        self.metrics.increment_counter("successful_requests")
    
    def monitor_tool_call(self, tool_name: str, parameters: dict, result: Any):
        """Monitor tool execution"""
        self.logger.log_tool_call(tool_name, parameters, result)
        self.metrics.increment_counter("tool_calls")
        self.metrics.increment_counter(f"tool_calls_{tool_name}")
    
    def monitor_cost(self, 
                    user_id: str,
                    model: str,
                    tokens: int,
                    cost: float):
        """Monitor cost"""
        self.cost_monitor.record_cost(user_id, cost, model, tokens)
        self.metrics.increment_counter("total_cost_cents", int(cost * 100))
    
    def monitor_error(self, error_type: str, error_message: str, context: dict):
        """Monitor error"""
        self.logger.log_error(error_type, error_message, context)
        self.metrics.increment_counter("failed_requests")
        self.metrics.increment_counter(f"error_{error_type}")
    
    def get_health_status(self) -> dict:
        """Get system health status"""
        total = self.metrics.get_counter("total_requests")
        successful = self.metrics.get_counter("successful_requests")
        failed = self.metrics.get_counter("failed_requests")
        
        success_rate = successful / total if total > 0 else 0
        
        response_stats = self.metrics.get_stats("response_time")
        avg_response_time = response_stats.get("avg", 0) if response_stats else 0
        
        # Determine health
        if success_rate < 0.9 or avg_response_time > 10:
            health = "unhealthy"
        elif success_rate < 0.95 or avg_response_time > 5:
            health = "degraded"
        else:
            health = "healthy"
        
        return {
            "status": health,
            "success_rate": success_rate,
            "avg_response_time": avg_response_time,
            "total_requests": total,
            "failed_requests": failed,
            "total_cost": self.cost_monitor.get_total_cost()
        }
    
    def generate_report(self) -> dict:
        """Generate monitoring report"""
        return {
            "agent_id": self.agent_id,
            "timestamp": time.time(),
            "health": self.get_health_status(),
            "metrics": {
                "response_time": self.metrics.get_stats("response_time"),
                "requests": {
                    "total": self.metrics.get_counter("total_requests"),
                    "successful": self.metrics.get_counter("successful_requests"),
                    "failed": self.metrics.get_counter("failed_requests")
                },
                "tool_calls": self.metrics.get_counter("tool_calls")
            },
            "cost": {
                "total": self.cost_monitor.get_total_cost(),
                "alerts": self.cost_monitor.get_alerts()
            },
            "feedback": {
                "average_rating": self.feedback.get_average_rating(),
                "recent": self.feedback.get_recent_feedback(limit=5)
            }
        }

# Usage
monitor = AgentMonitor("agent-001")

# Monitor request
context = monitor.monitor_request("user123", "What is Python?")

# Monitor tool call
monitor.monitor_tool_call("search", {"query": "Python"}, "Results...")

# Monitor cost
monitor.monitor_cost("user123", "gpt-4", 500, 0.015)

# Monitor response
monitor.monitor_response("user123", "Python is...", context)

# Get health status
health = monitor.get_health_status()
print(f"System health: {health['status']}")

# Generate report
report = monitor.generate_report()

Alerting

Set up alerts for critical issues.

Alert Manager

class AlertManager:
    """Manage alerts and notifications"""
    
    def __init__(self):
        self.alert_rules = []
        self.active_alerts = []
    
    def add_rule(self, 
                 name: str,
                 condition: Callable,
                 severity: str,
                 message: str):
        """Add alert rule"""
        self.alert_rules.append({
            "name": name,
            "condition": condition,
            "severity": severity,
            "message": message
        })
    
    def check_alerts(self, metrics: dict):
        """Check all alert rules"""
        new_alerts = []
        
        for rule in self.alert_rules:
            if rule["condition"](metrics):
                alert = {
                    "name": rule["name"],
                    "severity": rule["severity"],
                    "message": rule["message"],
                    "timestamp": time.time(),
                    "metrics": metrics
                }
                new_alerts.append(alert)
                self.trigger_alert(alert)
        
        self.active_alerts.extend(new_alerts)
        return new_alerts
    
    def trigger_alert(self, alert: dict):
        """Trigger alert notification"""
        print(f"\n🚨 ALERT [{alert['severity']}]: {alert['name']}")
        print(f"   {alert['message']}")
        
        # In production, send to:
        # - Email
        # - Slack
        # - PagerDuty
        # - etc.
    
    def get_active_alerts(self, severity: str = None) -> list:
        """Get active alerts"""
        if severity:
            return [a for a in self.active_alerts if a["severity"] == severity]
        return self.active_alerts

# Setup alerts
alerts = AlertManager()

# High error rate
alerts.add_rule(
    name="High Error Rate",
    condition=lambda m: m.get("success_rate", 1) < 0.9,
    severity="critical",
    message="Success rate below 90%"
)

# Slow response time
alerts.add_rule(
    name="Slow Response Time",
    condition=lambda m: m.get("avg_response_time", 0) > 5,
    severity="warning",
    message="Average response time above 5 seconds"
)

# High cost
alerts.add_rule(
    name="High Cost",
    condition=lambda m: m.get("total_cost", 0) > 50,
    severity="warning",
    message="Total cost exceeded $50"
)

Best Practices

Log everything: Requests, responses, errors, tool calls
Use structured logging: JSON format for easy parsing
Track key metrics: Response time, success rate, cost
Set up alerts: Be notified of issues immediately
Monitor costs: Track spending in real-time
Collect feedback: Learn from users
Create dashboards: Visualize metrics
Trace requests: Follow execution flow
Analyze trends: Look for patterns over time
Act on insights: Use data to improve

Practice Exercises

Exercise 1: Add Circuit Breaker (Medium)

Task: Implement a circuit breaker that stops calling a failing tool.

Click to see solution

class CircuitBreaker:
    def __init__(self, failure_threshold: int = 5):
        self.failure_count = 0
        self.threshold = failure_threshold
        self.state = "closed"  # closed, open, half-open
    
    def call(self, func, *args):
        if self.state == "open":
            raise Exception("Circuit breaker is open")
        
        try:
            result = func(*args)
            self.failure_count = 0
            return result
        except:
            self.failure_count += 1
            if self.failure_count >= self.threshold:
                self.state = "open"
            raise

Exercise 2: Build a Metrics Dashboard (Hard)

Task: Create a real-time dashboard showing agent metrics.

Click to see solution

from fastapi import FastAPI
from prometheus_client import make_asgi_app

app = FastAPI()

# Mount Prometheus metrics
metrics_app = make_asgi_app()
app.mount("/metrics", metrics_app)

@app.get("/dashboard")
def dashboard():
    return {
        "requests_total": requests_total._value.get(),
        "avg_duration": sum(durations) / len(durations),
        "error_rate": errors_total._value.get() / requests_total._value.get()
    }

✅ Chapter 5 Summary

You’ve learned production-ready practices:

Reliability: Input validation, guardrails, retries, and fallbacks

Testing: Unit tests, integration tests, benchmarks, and evaluation metrics

Monitoring: Logging, tracing, metrics, alerts, and feedback loops

These practices ensure your agents are safe, reliable, and maintainable in production environments.

Next Steps

Chapter 5 (Production-Ready Agents) is complete! You now understand reliability, testing, and monitoring. You’re ready to build production-grade agents that are safe, tested, and observable.

Would you like to continue with Chapter 6 (Specialized Agent Types)?

Coding Agents

Module 6: Learning Objectives

By the end of this module, you will:

✓ Build coding agents that analyze and generate code
✓ Create research agents with multi-source verification
✓ Implement task automation with workflow orchestration
✓ Design specialized agents for specific domains
✓ Integrate advanced capabilities into focused agents

Introduction to Coding Agents

Coding agents are specialized AI systems that understand, generate, modify, and debug code. They’re among the most powerful and practical agent applications.

What Makes Coding Agents Special?

Unique Capabilities:

Understand code semantics and structure
Generate syntactically correct code
Refactor and optimize existing code
Debug and fix errors
Write tests and documentation
Work across multiple programming languages

Key Challenges:

Code must be syntactically correct
Logic must be sound
Must handle edge cases
Need to understand context and dependencies
Security vulnerabilities must be avoided

Types of Coding Agents

Code Generation Agents: Write new code from specifications
Code Review Agents: Analyze and suggest improvements
Debugging Agents: Find and fix bugs
Refactoring Agents: Improve code structure
Testing Agents: Generate and run tests
Documentation Agents: Write comments and docs

Code Understanding and Generation

Understanding Code Structure

import ast
from typing import Dict, List, Any

class CodeAnalyzer:
    """Analyze code structure and semantics"""
    
    def __init__(self):
        self.client = openai.OpenAI()
    
    def parse_python_code(self, code: str) -> Dict[str, Any]:
        """Parse Python code into AST"""
        try:
            tree = ast.parse(code)
            
            analysis = {
                "functions": [],
                "classes": [],
                "imports": [],
                "variables": [],
                "complexity": 0
            }
            
            for node in ast.walk(tree):
                if isinstance(node, ast.FunctionDef):
                    analysis["functions"].append({
                        "name": node.name,
                        "args": [arg.arg for arg in node.args.args],
                        "line": node.lineno,
                        "docstring": ast.get_docstring(node)
                    })
                
                elif isinstance(node, ast.ClassDef):
                    methods = [
                        n.name for n in node.body 
                        if isinstance(n, ast.FunctionDef)
                    ]
                    analysis["classes"].append({
                        "name": node.name,
                        "methods": methods,
                        "line": node.lineno,
                        "docstring": ast.get_docstring(node)
                    })
                
                elif isinstance(node, ast.Import):
                    for alias in node.names:
                        analysis["imports"].append(alias.name)
                
                elif isinstance(node, ast.ImportFrom):
                    module = node.module or ""
                    for alias in node.names:
                        analysis["imports"].append(f"{module}.{alias.name}")
            
            return analysis
            
        except SyntaxError as e:
            return {
                "error": "Syntax error",
                "message": str(e),
                "line": e.lineno
            }
    
    def analyze_complexity(self, code: str) -> Dict[str, Any]:
        """Analyze code complexity"""
        try:
            tree = ast.parse(code)
            
            complexity = {
                "cyclomatic": 1,  # Base complexity
                "lines_of_code": len(code.split('\n')),
                "num_functions": 0,
                "num_classes": 0,
                "max_nesting": 0
            }
            
            for node in ast.walk(tree):
                # Count decision points for cyclomatic complexity
                if isinstance(node, (ast.If, ast.While, ast.For, ast.ExceptHandler)):
                    complexity["cyclomatic"] += 1
                
                elif isinstance(node, ast.FunctionDef):
                    complexity["num_functions"] += 1
                
                elif isinstance(node, ast.ClassDef):
                    complexity["num_classes"] += 1
            
            return complexity
            
        except Exception as e:
            return {"error": str(e)}
    
    def extract_dependencies(self, code: str) -> List[str]:
        """Extract external dependencies"""
        try:
            tree = ast.parse(code)
            dependencies = set()
            
            for node in ast.walk(tree):
                if isinstance(node, ast.Import):
                    for alias in node.names:
                        # Get top-level package
                        pkg = alias.name.split('.')[0]
                        dependencies.add(pkg)
                
                elif isinstance(node, ast.ImportFrom):
                    if node.module:
                        pkg = node.module.split('.')[0]
                        dependencies.add(pkg)
            
            # Filter out standard library
            stdlib = {'os', 'sys', 'json', 're', 'time', 'datetime', 'math'}
            external = dependencies - stdlib
            
            return sorted(external)
            
        except Exception as e:
            return []
    
    def understand_code_intent(self, code: str) -> str:
        """Use LLM to understand what code does"""
        prompt = f"""Analyze this code and explain what it does:

```python
{code}

Provide:

High-level purpose
Key functionality
Input/output
Any notable patterns or techniques

Explanation:“”“

    response = self.client.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": prompt}],
        temperature=0.3
    )
    
    return response.choices[0].message.content

Usage

analyzer = CodeAnalyzer()

code = “”“ def fibonacci(n): if n <= 1: return n return fibonacci(n-1) + fibonacci(n-2)

class Calculator: def add(self, a, b): return a + b “”“

analysis = analyzer.parse_python_code(code) print(f“Functions: {[f[‘name’] for f in analysis[‘functions’]]}“) print(f“Classes: {[c[‘name’] for c in analysis[‘classes’]]}”)

complexity = analyzer.analyze_complexity(code) print(f“Cyclomatic complexity: {complexity[‘cyclomatic’]}“)

intent = analyzer.understand_code_intent(code) print(f“Intent: {intent}“)


### Generating Code from Specifications

```python
class CodeGenerator:
    """Generate code from natural language specifications"""
    
    def __init__(self):
        self.client = openai.OpenAI()
    
    def generate_function(self, 
                         description: str,
                         language: str = "python",
                         include_tests: bool = False) -> Dict[str, str]:
        """Generate function from description"""
        
        prompt = f"""Generate a {language} function based on this description:

{description}

Requirements:
- Include type hints (if applicable)
- Add docstring with description, parameters, and return value
- Handle edge cases
- Include error handling
- Follow best practices
- Keep it simple and readable

{"Also generate unit tests for this function." if include_tests else ""}

Provide the code:"""
        
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.2
        )
        
        code = response.choices[0].message.content
        
        # Extract code and tests
        parts = self.extract_code_blocks(code)
        
        return {
            "code": parts.get("main", code),
            "tests": parts.get("tests", "") if include_tests else None
        }
    
    def generate_class(self,
                      description: str,
                      methods: List[str] = None) -> str:
        """Generate class from description"""
        
        methods_str = ""
        if methods:
            methods_str = f"\nMethods to implement:\n" + "\n".join(f"- {m}" for m in methods)
        
        prompt = f"""Generate a Python class based on this description:

{description}{methods_str}

Requirements:
- Include __init__ method
- Add docstrings for class and methods
- Use type hints
- Follow PEP 8 style guide
- Include example usage in docstring

Provide the code:"""
        
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.2
        )
        
        return self.extract_code_blocks(response.choices[0].message.content)["main"]
    
    def generate_from_signature(self, signature: str) -> str:
        """Generate function implementation from signature"""
        
        prompt = f"""Implement this function:

```python
{signature}
    pass

Provide a complete, working implementation with:

Proper logic
Error handling
Edge case handling
Comments for complex parts

Implementation:“”“

    response = self.client.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": prompt}],
        temperature=0.2
    )
    
    return self.extract_code_blocks(response.choices[0].message.content)["main"]

def extract_code_blocks(self, text: str) -> Dict[str, str]:
    """Extract code blocks from markdown"""
    import re
    
    # Find all code blocks
    pattern = r'```(?:python)?\n(.*?)```'
    blocks = re.findall(pattern, text, re.DOTALL)
    
    if not blocks:
        return {"main": text}
    
    result = {"main": blocks[0]}
    
    if len(blocks) > 1:
        result["tests"] = blocks[1]
    
    return result

Usage

generator = CodeGenerator()

Generate function

result = generator.generate_function( “Create a function that calculates the factorial of a number”, include_tests=True )

print(“Generated code:”) print(result[“code”])

if result[“tests”]: print(“\nGenerated tests:”) print(result[“tests”])

Generate class

class_code = generator.generate_class( “A simple cache that stores key-value pairs with expiration”, methods=[“set”, “get”, “delete”, “clear”] )

print(“\nGenerated class:”) print(class_code)



## Refactoring and Optimization

### Automated Refactoring

```python
class RefactoringAgent:
    """Refactor and improve code quality"""
    
    def __init__(self):
        self.client = openai.OpenAI()
    
    def refactor_for_readability(self, code: str) -> Dict[str, str]:
        """Improve code readability"""
        prompt = f"""Refactor this code for better readability:

```python
{code}

Apply these improvements:

Better variable names
Extract complex expressions
Add comments
Simplify logic
Follow PEP 8

Provide:

Refactored code
List of changes made

Response:“”“

    response = self.client.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": prompt}],
        temperature=0.3
    )
    
    return self.parse_response(response.choices[0].message.content)

def optimize_performance(self, code: str) -> Dict[str, str]:
    """Optimize code for performance"""
    prompt = f"""Optimize this code for better performance:

{code}

Consider:

Algorithm complexity
Data structure choices
Unnecessary operations
Caching opportunities
Memory usage

Provide optimized code with explanation:“”“

    response = self.client.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": prompt}],
        temperature=0.3
    )
    
    return self.parse_response(response.choices[0].message.content)

def apply_design_pattern(self, code: str, pattern: str) -> Dict[str, str]:
    """Apply design pattern to code"""
    prompt = f"""Refactor this code to use the {pattern} design pattern:

{code}

Explain:

Why this pattern is appropriate
How it improves the code
What changed

Refactored code:“”“

    response = self.client.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": prompt}],
        temperature=0.3
    )
    
    return self.parse_response(response.choices[0].message.content)

def extract_method(self, code: str, lines: tuple) -> Dict[str, str]:
    """Extract method refactoring"""
    prompt = f"""Extract lines {lines[0]}-{lines[1]} into a separate method:

{code}

Provide:

New method with good name
Updated original code
Method signature

Result:“”“

    response = self.client.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": prompt}],
        temperature=0.2
    )
    
    return self.parse_response(response.choices[0].message.content)

Usage

refactorer = RefactoringAgent()

Improve readability

messy_code = “”“ def f(x,y,z): if x>0: if y>0: if z>0: return x+y+z return 0 “”“

result = refactorer.refactor_for_readability(messy_code) print(“Refactored:”, result[“code”])

Optimize performance

slow_code = “”“ def find_duplicates(items): duplicates = [] for i in range(len(items)): for j in range(i+1, len(items)): if items[i] == items[j] and items[i] not in duplicates: duplicates.append(items[i]) return duplicates “”“

result = refactorer.optimize_performance(slow_code) print(“Optimized:”, result[“code”])


## Test Generation

### Comprehensive Test Generation

```python
class TestGenerator:
    """Generate comprehensive unit tests"""
    
    def __init__(self):
        self.client = openai.OpenAI()
    
    def generate_unit_tests(self, code: str, framework: str = "pytest") -> str:
        """Generate unit tests with full coverage"""
        prompt = f"""Generate comprehensive {framework} tests for this code:

```python
{code}

Include tests for:

Normal/happy path cases
Edge cases (empty, None, boundaries)
Error cases (invalid input, exceptions)
Integration scenarios
Fixtures and setup if needed

Use descriptive test names and add comments.

Tests:“”“

    response = self.client.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": prompt}],
        temperature=0.2
    )
    
    return self.extract_code(response.choices[0].message.content)

def generate_property_tests(self, code: str) -> str:
    """Generate property-based tests using Hypothesis"""
    prompt = f"""Generate property-based tests using Hypothesis for:

{code}

Create tests that verify properties like:

Invariants
Idempotence
Commutativity
Round-trip properties

Tests:“”“

    response = self.client.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": prompt}],
        temperature=0.2
    )
    
    return self.extract_code(response.choices[0].message.content)

def generate_integration_tests(self, code: str, dependencies: List[str]) -> str:
    """Generate integration tests"""
    deps_str = ", ".join(dependencies)
    
    prompt = f"""Generate integration tests for this code that interacts with: {deps_str}

{code}

Include:

Mocking external dependencies
Testing interactions
Setup and teardown
Error scenarios

Tests:“”“

    response = self.client.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": prompt}],
        temperature=0.2
    )
    
    return self.extract_code(response.choices[0].message.content)

def extract_code(self, text: str) -> str:
    """Extract code from markdown"""
    import re
    pattern = r'```(?:python)?\n(.*?)```'
    matches = re.findall(pattern, text, re.DOTALL)
    return matches[0] if matches else text

Usage

test_gen = TestGenerator()

code_to_test = “”“ def divide(a: float, b: float) -> float: if b == 0: raise ValueError(“Cannot divide by zero”) return a / b “”“

tests = test_gen.generate_unit_tests(code_to_test) print(“Generated tests:”) print(tests)


## Debugging and Error Fixing

### Automated Debugging Agent

```python
class DebuggingAgent:
    """Find and fix bugs in code"""
    
    def __init__(self):
        self.client = openai.OpenAI()
        self.sandbox = CodeExecutor()  # From previous chapters
    
    def debug_code(self, code: str, error_message: str = None) -> Dict:
        """Debug code and suggest fixes"""
        
        # Try to execute and capture error if not provided
        if not error_message:
            result = self.sandbox.execute(code)
            if not result["success"]:
                error_message = result["output"]
        
        prompt = f"""Debug this code:

```python
{code}

Error: {error_message}

Provide:

Root cause analysis
Fixed code
Explanation of the fix
How to prevent similar bugs

Response:“”“

    response = self.client.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": prompt}],
        temperature=0.2
    )
    
    return self.parse_debug_response(response.choices[0].message.content)

def find_logical_errors(self, code: str, expected_behavior: str) -> Dict:
    """Find logical errors (code runs but wrong output)"""
    prompt = f"""This code runs without errors but produces wrong results:

{code}

Expected behavior: {expected_behavior}

Analyze:

What’s the logical error?
Why does it produce wrong results?
How to fix it?
Test cases to verify the fix

Analysis:“”“

    response = self.client.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": prompt}],
        temperature=0.3
    )
    
    return self.parse_debug_response(response.choices[0].message.content)

def suggest_improvements(self, code: str, issue: str) -> List[str]:
    """Suggest multiple ways to fix an issue"""
    prompt = f"""Suggest 3 different ways to fix this issue:

Code:

{code}

Issue: {issue}

For each solution, provide:

The fix
Pros and cons
When to use it

Solutions:“”“

    response = self.client.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": prompt}],
        temperature=0.5
    )
    
    return self.parse_solutions(response.choices[0].message.content)

def iterative_fix(self, code: str, max_attempts: int = 3) -> Dict:
    """Iteratively fix code until it works"""
    for attempt in range(max_attempts):
        # Try to execute
        result = self.sandbox.execute(code)
        
        if result["success"]:
            return {
                "success": True,
                "code": code,
                "attempts": attempt + 1
            }
        
        # Try to fix
        fix_result = self.debug_code(code, result["output"])
        code = fix_result["fixed_code"]
    
    return {
        "success": False,
        "code": code,
        "attempts": max_attempts,
        "last_error": result["output"]
    }

Usage

debugger = DebuggingAgent()

buggy_code = “”“ def calculate_average(numbers): total = 0 for num in numbers: total += num return total / len(numbers)

This will crash on empty list

result = calculate_average([]) “”“

fix = debugger.debug_code(buggy_code) print(“Root cause:”, fix[“root_cause”]) print(“Fixed code:”, fix[“fixed_code”])


## Repository-Level Operations

### Codebase Understanding

```python
from pathlib import Path
import json

class CodebaseAgent:
    """Understand and navigate entire codebases"""
    
    def __init__(self, root_path: str):
        self.root_path = Path(root_path)
        self.index = {}
        self.dependency_graph = {}
        self.client = openai.OpenAI()
    
    def index_codebase(self):
        """Index all Python files in codebase"""
        print("Indexing codebase...")
        
        for py_file in self.root_path.rglob("*.py"):
            if "venv" in str(py_file) or ".git" in str(py_file):
                continue
            
            try:
                with open(py_file) as f:
                    code = f.read()
                
                analyzer = CodeAnalyzer()
                analysis = analyzer.parse_python_code(code)
                
                self.index[str(py_file.relative_to(self.root_path))] = {
                    "analysis": analysis,
                    "size": len(code),
                    "lines": len(code.split('\n'))
                }
            except Exception as e:
                print(f"Error indexing {py_file}: {e}")
        
        print(f"Indexed {len(self.index)} files")
    
    def find_function_definition(self, function_name: str) -> List[Dict]:
        """Find where a function is defined"""
        results = []
        
        for file_path, data in self.index.items():
            for func in data["analysis"].get("functions", []):
                if func["name"] == function_name:
                    results.append({
                        "file": file_path,
                        "line": func["line"],
                        "signature": f"{func['name']}({', '.join(func['args'])})"
                    })
        
        return results
    
    def find_class_definition(self, class_name: str) -> List[Dict]:
        """Find where a class is defined"""
        results = []
        
        for file_path, data in self.index.items():
            for cls in data["analysis"].get("classes", []):
                if cls["name"] == class_name:
                    results.append({
                        "file": file_path,
                        "line": cls["line"],
                        "methods": cls["methods"]
                    })
        
        return results
    
    def find_usages(self, symbol: str) -> List[Dict]:
        """Find where a symbol is used"""
        usages = []
        
        for py_file in self.root_path.rglob("*.py"):
            if "venv" in str(py_file):
                continue
            
            try:
                with open(py_file) as f:
                    for i, line in enumerate(f, 1):
                        if symbol in line:
                            usages.append({
                                "file": str(py_file.relative_to(self.root_path)),
                                "line": i,
                                "content": line.strip()
                            })
            except:
                pass
        
        return usages
    
    def analyze_dependencies(self):
        """Build dependency graph"""
        for file_path, data in self.index.items():
            imports = data["analysis"].get("imports", [])
            self.dependency_graph[file_path] = imports
    
    def get_codebase_summary(self) -> Dict:
        """Get high-level codebase summary"""
        total_files = len(self.index)
        total_functions = sum(
            len(data["analysis"].get("functions", []))
            for data in self.index.values()
        )
        total_classes = sum(
            len(data["analysis"].get("classes", []))
            for data in self.index.values()
        )
        total_lines = sum(
            data["lines"]
            for data in self.index.values()
        )
        
        return {
            "total_files": total_files,
            "total_functions": total_functions,
            "total_classes": total_classes,
            "total_lines": total_lines,
            "avg_lines_per_file": total_lines / total_files if total_files > 0 else 0
        }
    
    def explain_codebase(self) -> str:
        """Generate high-level explanation of codebase"""
        summary = self.get_codebase_summary()
        
        # Get file structure
        files = list(self.index.keys())
        
        prompt = f"""Explain this codebase structure:

Files: {len(files)}
Functions: {summary['total_functions']}
Classes: {summary['total_classes']}
Lines of code: {summary['total_lines']}

File structure:
{chr(10).join(files[:20])}

Provide:
1. What this codebase likely does
2. Main components/modules
3. Architecture pattern
4. Key areas of functionality

Explanation:"""
        
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.3
        )
        
        return response.choices[0].message.content

# Usage
codebase = CodebaseAgent("./my_project")
codebase.index_codebase()

# Find function
results = codebase.find_function_definition("process_data")
print(f"Found in: {results}")

# Get summary
summary = codebase.get_codebase_summary()
print(f"Codebase: {summary['total_files']} files, {summary['total_lines']} lines")

# Explain codebase
explanation = codebase.explain_codebase()
print(explanation)

Complete Coding Agent System

class CompleteCodingAgent:
    """Full-featured coding agent"""
    
    def __init__(self):
        self.analyzer = CodeAnalyzer()
        self.generator = CodeGenerator()
        self.refactorer = RefactoringAgent()
        self.test_gen = TestGenerator()
        self.debugger = DebuggingAgent()
        self.client = openai.OpenAI()
    
    def process_request(self, request: str, code: str = None, context: Dict = None) -> Dict:
        """Process any coding request"""
        
        # Classify intent
        intent = self.classify_intent(request)
        
        if intent == "generate":
            return self.handle_generation(request)
        
        elif intent == "analyze":
            return self.handle_analysis(code)
        
        elif intent == "refactor":
            return self.handle_refactoring(code, request)
        
        elif intent == "test":
            return self.handle_test_generation(code)
        
        elif intent == "debug":
            return self.handle_debugging(code, context)
        
        elif intent == "explain":
            return self.handle_explanation(code)
        
        else:
            return {"error": "Could not understand request"}
    
    def handle_generation(self, request: str) -> Dict:
        """Handle code generation requests"""
        code = self.generator.generate_function(request)
        
        # Validate generated code
        validation = self.analyzer.parse_python_code(code)
        
        if "error" in validation:
            # Try to fix
            fixed = self.debugger.debug_code(code, validation["error"])
            code = fixed["fixed_code"]
        
        # Generate tests
        tests = self.test_gen.generate_unit_tests(code)
        
        return {
            "type": "generation",
            "code": code,
            "tests": tests,
            "validated": True
        }
    
    def handle_analysis(self, code: str) -> Dict:
        """Handle code analysis requests"""
        # Parse structure
        structure = self.analyzer.parse_python_code(code)
        
        # Analyze complexity
        complexity = self.analyzer.analyze_complexity(code)
        
        # Get explanation
        explanation = self.analyzer.understand_code_intent(code)
        
        return {
            "type": "analysis",
            "structure": structure,
            "complexity": complexity,
            "explanation": explanation
        }
    
    def handle_refactoring(self, code: str, request: str) -> Dict:
        """Handle refactoring requests"""
        if "performance" in request.lower():
            result = self.refactorer.optimize_performance(code)
        elif "readable" in request.lower():
            result = self.refactorer.refactor_for_readability(code)
        else:
            result = self.refactorer.refactor_code(code)
        
        return {
            "type": "refactoring",
            "original": code,
            "refactored": result["code"],
            "changes": result.get("changes", [])
        }
    
    def handle_test_generation(self, code: str) -> Dict:
        """Handle test generation requests"""
        unit_tests = self.test_gen.generate_unit_tests(code)
        
        return {
            "type": "tests",
            "code": code,
            "tests": unit_tests
        }
    
    def handle_debugging(self, code: str, context: Dict) -> Dict:
        """Handle debugging requests"""
        error_msg = context.get("error") if context else None
        
        result = self.debugger.debug_code(code, error_msg)
        
        return {
            "type": "debugging",
            "original": code,
            "fixed": result["fixed_code"],
            "explanation": result.get("explanation", "")
        }
    
    def handle_explanation(self, code: str) -> Dict:
        """Handle code explanation requests"""
        explanation = self.analyzer.understand_code_intent(code)
        structure = self.analyzer.parse_python_code(code)
        
        return {
            "type": "explanation",
            "explanation": explanation,
            "structure": structure
        }
    
    def classify_intent(self, request: str) -> str:
        """Classify user intent"""
        request_lower = request.lower()
        
        keywords = {
            "generate": ["generate", "create", "write", "implement"],
            "analyze": ["analyze", "understand", "explain what"],
            "refactor": ["refactor", "improve", "optimize", "clean"],
            "test": ["test", "unittest", "pytest"],
            "debug": ["debug", "fix", "error", "bug"],
            "explain": ["explain", "what does", "how does"]
        }
        
        for intent, words in keywords.items():
            if any(word in request_lower for word in words):
                return intent
        
        return "unknown"

# Usage
agent = CompleteCodingAgent()

# Generate code
result = agent.process_request("Create a function to validate email addresses")
print("Generated code:")
print(result["code"])
print("\nTests:")
print(result["tests"])

# Analyze code
code = """
def fibonacci(n):
    if n <= 1:
        return n
    return fibonacci(n-1) + fibonacci(n-2)
"""

result = agent.process_request("Analyze this code", code=code)
print("\nComplexity:", result["complexity"])
print("Explanation:", result["explanation"])

# Refactor code
result = agent.process_request("Optimize this code for performance", code=code)
print("\nRefactored:")
print(result["refactored"])

Best Practices for Coding Agents

1. Code Quality Checks

Always validate generated code:

Syntax checking (AST parsing)
Style checking (PEP 8, linting)
Security scanning (bandit, safety)
Type checking (mypy)

2. Testing Strategy

Generate tests alongside code
Run tests automatically
Achieve high coverage
Include edge cases

3. Context Awareness

Understand existing codebase
Match coding style
Respect conventions
Consider dependencies

4. Iterative Improvement

Start with simple solution
Refine based on feedback
Test incrementally
Document changes

5. Security Considerations

Validate all inputs
Avoid SQL injection
Check for XSS vulnerabilities
Use secure libraries
Never expose secrets

6. Performance Optimization

Profile before optimizing
Choose right algorithms
Consider memory usage
Cache when appropriate
Benchmark improvements

7. Documentation

Generate docstrings
Add inline comments
Create README files
Document APIs
Explain complex logic

8. Version Control

Commit frequently
Write clear messages
Use branches
Review changes
Tag releases

9. Collaboration

Follow team standards
Request code reviews
Share knowledge
Document decisions
Communicate changes

10. Continuous Learning

Learn from mistakes
Study good code
Stay updated
Experiment safely
Share learnings

Advanced Topics

Multi-Language Support

class MultiLanguageAgent:
    """Support multiple programming languages"""
    
    def __init__(self):
        self.client = openai.OpenAI()
        self.supported_languages = ["python", "javascript", "java", "go", "rust"]
    
    def generate_code(self, description: str, language: str) -> str:
        """Generate code in specified language"""
        if language not in self.supported_languages:
            raise ValueError(f"Unsupported language: {language}")
        
        prompt = f"""Generate {language} code for:

{description}

Follow {language} best practices and conventions.

Code:"""
        
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.2
        )
        
        return response.choices[0].message.content
    
    def translate_code(self, code: str, from_lang: str, to_lang: str) -> str:
        """Translate code between languages"""
        prompt = f"""Translate this {from_lang} code to {to_lang}:

```{from_lang}
{code}

Maintain:

Same functionality
Idiomatic {to_lang} style
Best practices

{to_lang} code:“”“

    response = self.client.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": prompt}],
        temperature=0.2
    )
    
    return response.choices[0].message.content


### Code Review Agent

```python
class CodeReviewAgent:
    """Automated code review"""
    
    def __init__(self):
        self.client = openai.OpenAI()
    
    def review_code(self, code: str) -> Dict:
        """Comprehensive code review"""
        prompt = f"""Review this code:

```python
{code}

Provide feedback on:

Code quality (readability, maintainability)
Potential bugs or issues
Performance concerns
Security vulnerabilities
Best practice violations
Suggestions for improvement

Rate each category 1-5 and provide specific feedback.

Review:“”“

    response = self.client.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": prompt}],
        temperature=0.3
    )
    
    return self.parse_review(response.choices[0].message.content)

def suggest_improvements(self, code: str) -> List[Dict]:
    """Suggest specific improvements"""
    review = self.review_code(code)
    
    improvements = []
    for issue in review.get("issues", []):
        improvements.append({
            "issue": issue,
            "suggestion": self.generate_fix(code, issue),
            "priority": self.assess_priority(issue)
        })
    
    return improvements


## Next Steps

You now have comprehensive knowledge of coding agents! Next, we'll explore research agents that gather and synthesize information from multiple sources.

Research Agents

Introduction to Research Agents

Research agents are specialized AI systems that gather, analyze, and synthesize information from multiple sources to answer complex questions or investigate topics in depth.

What Makes Research Agents Unique?

Core Capabilities:

Multi-source information gathering
Source credibility assessment
Information synthesis and summarization
Citation management
Fact verification
Deep topic exploration

Key Challenges:

Information overload
Source reliability
Conflicting information
Bias detection
Citation accuracy
Staying current

Types of Research Agents

Academic Research Agents: Literature reviews, paper analysis
Market Research Agents: Competitive analysis, trends
Investigative Agents: Deep dives, fact-checking
News Aggregation Agents: Current events, monitoring
Technical Research Agents: Documentation, specifications

Information Gathering Strategies

Multi-Source Search

from typing import List, Dict
import requests
from bs4 import BeautifulSoup

class MultiSourceSearcher:
    """Search across multiple sources"""
    
    def __init__(self):
        self.client = openai.OpenAI()
        self.sources = {
            "web": self.search_web,
            "academic": self.search_academic,
            "news": self.search_news,
            "social": self.search_social
        }
    
    def search_all_sources(self, query: str, sources: List[str] = None) -> Dict:
        """Search across all specified sources"""
        if sources is None:
            sources = list(self.sources.keys())
        
        results = {}
        
        for source in sources:
            if source in self.sources:
                print(f"Searching {source}...")
                results[source] = self.sources[source](query)
        
        return results
    
    def search_web(self, query: str) -> List[Dict]:
        """Search general web"""
        # Using a search API (example with Google Custom Search)
        api_key = os.getenv("GOOGLE_API_KEY")
        search_engine_id = os.getenv("GOOGLE_SEARCH_ENGINE_ID")
        
        url = "https://www.googleapis.com/customsearch/v1"
        params = {
            "key": api_key,
            "cx": search_engine_id,
            "q": query,
            "num": 10
        }
        
        try:
            response = requests.get(url, params=params, timeout=10)
            response.raise_for_status()
            data = response.json()
            
            results = []
            for item in data.get("items", []):
                results.append({
                    "title": item["title"],
                    "url": item["link"],
                    "snippet": item["snippet"],
                    "source": "web"
                })
            
            return results
        except Exception as e:
            print(f"Web search error: {e}")
            return []
    
    def search_academic(self, query: str) -> List[Dict]:
        """Search academic sources (arXiv, PubMed, etc.)"""
        # Example with arXiv API
        url = "http://export.arxiv.org/api/query"
        params = {
            "search_query": f"all:{query}",
            "start": 0,
            "max_results": 10
        }
        
        try:
            response = requests.get(url, params=params, timeout=10)
            response.raise_for_status()
            
            # Parse XML response
            from xml.etree import ElementTree as ET
            root = ET.fromstring(response.content)
            
            results = []
            for entry in root.findall("{http://www.w3.org/2005/Atom}entry"):
                title = entry.find("{http://www.w3.org/2005/Atom}title").text
                summary = entry.find("{http://www.w3.org/2005/Atom}summary").text
                link = entry.find("{http://www.w3.org/2005/Atom}id").text
                
                results.append({
                    "title": title.strip(),
                    "url": link,
                    "snippet": summary.strip()[:200],
                    "source": "academic"
                })
            
            return results
        except Exception as e:
            print(f"Academic search error: {e}")
            return []
    
    def search_news(self, query: str) -> List[Dict]:
        """Search news sources"""
        # Example with News API
        api_key = os.getenv("NEWS_API_KEY")
        url = "https://newsapi.org/v2/everything"
        params = {
            "q": query,
            "apiKey": api_key,
            "pageSize": 10,
            "sortBy": "relevancy"
        }
        
        try:
            response = requests.get(url, params=params, timeout=10)
            response.raise_for_status()
            data = response.json()
            
            results = []
            for article in data.get("articles", []):
                results.append({
                    "title": article["title"],
                    "url": article["url"],
                    "snippet": article["description"],
                    "source": "news",
                    "published": article.get("publishedAt")
                })
            
            return results
        except Exception as e:
            print(f"News search error: {e}")
            return []
    
    def search_social(self, query: str) -> List[Dict]:
        """Search social media (Twitter, Reddit, etc.)"""
        # Example implementation for Reddit
        url = f"https://www.reddit.com/search.json"
        params = {
            "q": query,
            "limit": 10,
            "sort": "relevance"
        }
        headers = {"User-Agent": "ResearchAgent/1.0"}
        
        try:
            response = requests.get(url, params=params, headers=headers, timeout=10)
            response.raise_for_status()
            data = response.json()
            
            results = []
            for post in data["data"]["children"]:
                post_data = post["data"]
                results.append({
                    "title": post_data["title"],
                    "url": f"https://reddit.com{post_data['permalink']}",
                    "snippet": post_data.get("selftext", "")[:200],
                    "source": "social",
                    "score": post_data.get("score", 0)
                })
            
            return results
        except Exception as e:
            print(f"Social search error: {e}")
            return []

# Usage
searcher = MultiSourceSearcher()
results = searcher.search_all_sources("artificial intelligence agents")

for source, items in results.items():
    print(f"\n{source.upper()} Results: {len(items)}")
    for item in items[:3]:
        print(f"  - {item['title']}")

Deep Content Extraction

class ContentExtractor:
    """Extract and process content from sources"""
    
    def __init__(self):
        self.client = openai.OpenAI()
    
    def extract_from_url(self, url: str) -> Dict:
        """Extract main content from URL"""
        try:
            headers = {
                "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
            }
            response = requests.get(url, headers=headers, timeout=10)
            response.raise_for_status()
            
            soup = BeautifulSoup(response.content, 'html.parser')
            
            # Remove script and style elements
            for script in soup(["script", "style"]):
                script.decompose()
            
            # Get text
            text = soup.get_text(separator='\n', strip=True)
            
            # Extract metadata
            title = soup.find('title')
            title_text = title.string if title else ""
            
            meta_desc = soup.find('meta', attrs={'name': 'description'})
            description = meta_desc['content'] if meta_desc else ""
            
            return {
                "url": url,
                "title": title_text,
                "description": description,
                "content": text[:10000],  # Limit content
                "word_count": len(text.split())
            }
            
        except Exception as e:
            return {
                "url": url,
                "error": str(e)
            }
    
    def extract_key_points(self, content: str) -> List[str]:
        """Extract key points from content"""
        prompt = f"""Extract the key points from this content:

{content[:4000]}

Provide 5-7 bullet points of the most important information:"""
        
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.3
        )
        
        points = response.choices[0].message.content.strip().split('\n')
        return [p.strip('- ').strip() for p in points if p.strip()]
    
    def extract_quotes(self, content: str, topic: str) -> List[Dict]:
        """Extract relevant quotes"""
        prompt = f"""Find relevant quotes about "{topic}" from this content:

{content[:4000]}

Provide 3-5 direct quotes with context:"""
        
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.3
        )
        
        # Parse quotes
        quotes_text = response.choices[0].message.content
        # Simple parsing - in production, use more robust method
        quotes = []
        for line in quotes_text.split('\n'):
            if line.strip().startswith('"'):
                quotes.append({"quote": line.strip(), "context": ""})
        
        return quotes

# Usage
extractor = ContentExtractor()

# Extract content
content = extractor.extract_from_url("https://example.com/article")
print(f"Title: {content['title']}")
print(f"Words: {content['word_count']}")

# Extract key points
key_points = extractor.extract_key_points(content['content'])
for point in key_points:
    print(f"  • {point}")

Source Verification

Credibility Assessment

class SourceVerifier:
    """Verify source credibility and reliability"""
    
    def __init__(self):
        self.client = openai.OpenAI()
        self.trusted_domains = {
            "academic": [".edu", ".gov", "arxiv.org", "pubmed.gov"],
            "news": ["reuters.com", "apnews.com", "bbc.com"],
            "tech": ["github.com", "stackoverflow.com"]
        }
    
    def assess_credibility(self, url: str, content: str = None) -> Dict:
        """Assess source credibility"""
        from urllib.parse import urlparse
        
        domain = urlparse(url).netloc
        
        # Check against trusted domains
        trust_level = "unknown"
        for category, domains in self.trusted_domains.items():
            if any(trusted in domain for trusted in domains):
                trust_level = "high"
                break
        
        # Analyze content if provided
        content_score = None
        if content:
            content_score = self.analyze_content_quality(content)
        
        return {
            "url": url,
            "domain": domain,
            "trust_level": trust_level,
            "content_quality": content_score,
            "is_trusted": trust_level == "high"
        }
    
    def analyze_content_quality(self, content: str) -> Dict:
        """Analyze content quality indicators"""
        prompt = f"""Analyze the quality and credibility of this content:

{content[:2000]}

Rate (1-5) on:
1. Factual accuracy (based on claims made)
2. Objectivity (bias level)
3. Citation quality (references provided)
4. Writing quality (clarity, professionalism)
5. Depth of analysis

Provide scores and brief explanation:"""
        
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.3
        )
        
        return self.parse_quality_scores(response.choices[0].message.content)
    
    def cross_reference(self, claim: str, sources: List[Dict]) -> Dict:
        """Cross-reference a claim across sources"""
        confirmations = 0
        contradictions = 0
        
        for source in sources:
            result = self.check_claim_in_source(claim, source.get("content", ""))
            
            if result == "confirms":
                confirmations += 1
            elif result == "contradicts":
                contradictions += 1
        
        return {
            "claim": claim,
            "confirmations": confirmations,
            "contradictions": contradictions,
            "confidence": confirmations / len(sources) if sources else 0
        }
    
    def check_claim_in_source(self, claim: str, content: str) -> str:
        """Check if source confirms, contradicts, or is neutral on claim"""
        prompt = f"""Does this content confirm, contradict, or neither regarding this claim?

Claim: {claim}

Content: {content[:1000]}

Answer with just: confirms, contradicts, or neutral"""
        
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.1
        )
        
        return response.choices[0].message.content.strip().lower()

# Usage
verifier = SourceVerifier()

# Assess credibility
credibility = verifier.assess_credibility(
    "https://arxiv.org/abs/2023.12345",
    "This paper presents..."
)
print(f"Trust level: {credibility['trust_level']}")

# Cross-reference claim
claim = "AI agents can autonomously complete complex tasks"
sources = [
    {"content": "Research shows AI agents are capable of..."},
    {"content": "Studies indicate autonomous agents can..."}
]
verification = verifier.cross_reference(claim, sources)
print(f"Confidence: {verification['confidence']:.0%}")

Synthesis and Summarization

Information Synthesis

class InformationSynthesizer:
    """Synthesize information from multiple sources"""
    
    def __init__(self):
        self.client = openai.OpenAI()
    
    def synthesize_sources(self, 
                          query: str,
                          sources: List[Dict],
                          style: str = "comprehensive") -> str:
        """Synthesize information from multiple sources"""
        
        # Prepare source summaries
        source_texts = []
        for i, source in enumerate(sources[:10], 1):  # Limit to 10 sources
            source_texts.append(f"""
Source {i}: {source.get('title', 'Unknown')}
URL: {source.get('url', 'N/A')}
Content: {source.get('snippet', source.get('content', ''))[:500]}
""")
        
        sources_combined = "\n---\n".join(source_texts)
        
        style_instructions = {
            "comprehensive": "Provide a detailed, thorough analysis",
            "concise": "Provide a brief, focused summary",
            "academic": "Use formal, academic tone with citations",
            "casual": "Use conversational, accessible language"
        }
        
        prompt = f"""Synthesize information about: {query}

Sources:
{sources_combined}

{style_instructions.get(style, style_instructions['comprehensive'])}.

Requirements:
- Integrate information from multiple sources
- Identify common themes and patterns
- Note any contradictions
- Cite sources [1], [2], etc.
- Provide balanced perspective

Synthesis:"""
        
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.4,
            max_tokens=2000
        )
        
        synthesis = response.choices[0].message.content
        
        # Add source list
        source_list = "\n\nSources:\n"
        for i, source in enumerate(sources[:10], 1):
            source_list += f"[{i}] {source.get('title', 'Unknown')} - {source.get('url', 'N/A')}\n"
        
        return synthesis + source_list
    
    def identify_themes(self, sources: List[Dict]) -> List[Dict]:
        """Identify common themes across sources"""
        # Combine content
        combined_content = "\n\n".join([
            s.get('snippet', s.get('content', ''))[:500]
            for s in sources[:20]
        ])
        
        prompt = f"""Identify the main themes in these sources:

{combined_content}

List 5-7 key themes with:
- Theme name
- Brief description
- How many sources mention it

Themes:"""
        
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.3
        )
        
        return self.parse_themes(response.choices[0].message.content)
    
    def find_contradictions(self, sources: List[Dict]) -> List[Dict]:
        """Find contradictions between sources"""
        contradictions = []
        
        # Compare sources pairwise (simplified)
        for i in range(min(5, len(sources))):
            for j in range(i+1, min(5, len(sources))):
                source_a = sources[i]
                source_b = sources[j]
                
                prompt = f"""Do these sources contradict each other?

Source A: {source_a.get('snippet', '')[:300]}

Source B: {source_b.get('snippet', '')[:300]}

If yes, explain the contradiction. If no, say "no contradiction".

Analysis:"""
                
                response = self.client.chat.completions.create(
                    model="gpt-4",
                    messages=[{"role": "user", "content": prompt}],
                    temperature=0.2
                )
                
                result = response.choices[0].message.content
                
                if "no contradiction" not in result.lower():
                    contradictions.append({
                        "source_a": source_a.get('title'),
                        "source_b": source_b.get('title'),
                        "contradiction": result
                    })
        
        return contradictions

# Usage
synthesizer = InformationSynthesizer()

sources = [
    {"title": "AI Agents Overview", "url": "...", "snippet": "AI agents are..."},
    {"title": "Agent Architectures", "url": "...", "snippet": "Modern agents use..."},
    # ... more sources
]

# Synthesize
synthesis = synthesizer.synthesize_sources(
    "What are AI agents?",
    sources,
    style="comprehensive"
)
print(synthesis)

# Identify themes
themes = synthesizer.identify_themes(sources)
for theme in themes:
    print(f"Theme: {theme}")

Citation Management

Automatic Citation Generation

class CitationManager:
    """Manage citations and references"""
    
    def __init__(self):
        self.citations = []
        self.citation_style = "APA"  # APA, MLA, Chicago
    
    def add_citation(self, source: Dict) -> int:
        """Add source and return citation number"""
        self.citations.append(source)
        return len(self.citations)
    
    def format_citation(self, source: Dict, style: str = None) -> str:
        """Format citation in specified style"""
        style = style or self.citation_style
        
        if style == "APA":
            return self.format_apa(source)
        elif style == "MLA":
            return self.format_mla(source)
        elif style == "Chicago":
            return self.format_chicago(source)
        else:
            return self.format_simple(source)
    
    def format_apa(self, source: Dict) -> str:
        """Format in APA style"""
        author = source.get('author', 'Unknown')
        year = source.get('year', 'n.d.')
        title = source.get('title', 'Untitled')
        url = source.get('url', '')
        
        return f"{author}. ({year}). {title}. Retrieved from {url}"
    
    def format_mla(self, source: Dict) -> str:
        """Format in MLA style"""
        author = source.get('author', 'Unknown')
        title = source.get('title', 'Untitled')
        website = source.get('website', 'Web')
        url = source.get('url', '')
        
        return f'{author}. "{title}." {website}. {url}.'
    
    def format_simple(self, source: Dict) -> str:
        """Simple format"""
        title = source.get('title', 'Untitled')
        url = source.get('url', '')
        return f"{title} - {url}"
    
    def generate_bibliography(self) -> str:
        """Generate full bibliography"""
        bibliography = "References:\n\n"
        
        for i, source in enumerate(self.citations, 1):
            citation = self.format_citation(source)
            bibliography += f"{i}. {citation}\n"
        
        return bibliography
    
    def inline_cite(self, text: str, citation_num: int) -> str:
        """Add inline citation to text"""
        return f"{text} [{citation_num}]"

# Usage
citations = CitationManager()

# Add sources
source1 = {
    "author": "Smith, J.",
    "year": "2023",
    "title": "Understanding AI Agents",
    "url": "https://example.com/article"
}

cite_num = citations.add_citation(source1)

# Use in text
text = citations.inline_cite("AI agents are autonomous systems", cite_num)
print(text)  # "AI agents are autonomous systems [1]"

# Generate bibliography
print(citations.generate_bibliography())

Complete Research Agent

class ResearchAgent:
    """Complete research agent system"""
    
    def __init__(self):
        self.searcher = MultiSourceSearcher()
        self.extractor = ContentExtractor()
        self.verifier = SourceVerifier()
        self.synthesizer = InformationSynthesizer()
        self.citations = CitationManager()
        self.client = openai.OpenAI()
    
    def research(self, 
                query: str,
                depth: str = "medium",
                sources: List[str] = None) -> Dict:
        """Conduct comprehensive research"""
        
        print(f"🔍 Researching: {query}\n")
        
        # 1. Search multiple sources
        print("📚 Gathering sources...")
        search_results = self.searcher.search_all_sources(query, sources)
        
        all_sources = []
        for source_type, results in search_results.items():
            all_sources.extend(results)
        
        print(f"Found {len(all_sources)} sources\n")
        
        # 2. Extract and verify content
        print("📖 Extracting content...")
        verified_sources = []
        
        for source in all_sources[:20]:  # Limit processing
            # Extract content
            if 'content' not in source:
                content_data = self.extractor.extract_from_url(source['url'])
                source['content'] = content_data.get('content', source.get('snippet', ''))
            
            # Verify credibility
            credibility = self.verifier.assess_credibility(
                source['url'],
                source.get('content', '')
            )
            
            if credibility['is_trusted'] or credibility['trust_level'] != 'low':
                source['credibility'] = credibility
                verified_sources.append(source)
                
                # Add citation
                cite_num = self.citations.add_citation(source)
                source['citation_num'] = cite_num
        
        print(f"Verified {len(verified_sources)} sources\n")
        
        # 3. Synthesize information
        print("✍️  Synthesizing findings...")
        synthesis = self.synthesizer.synthesize_sources(
            query,
            verified_sources,
            style="comprehensive" if depth == "deep" else "concise"
        )
        
        # 4. Identify themes
        themes = self.synthesizer.identify_themes(verified_sources)
        
        # 5. Find contradictions
        contradictions = self.synthesizer.find_contradictions(verified_sources)
        
        # 6. Generate bibliography
        bibliography = self.citations.generate_bibliography()
        
        return {
            "query": query,
            "synthesis": synthesis,
            "themes": themes,
            "contradictions": contradictions,
            "sources": verified_sources,
            "bibliography": bibliography,
            "source_count": len(verified_sources)
        }
    
    def deep_dive(self, topic: str, subtopics: List[str] = None) -> Dict:
        """Deep research on topic with subtopics"""
        
        if not subtopics:
            # Generate subtopics
            subtopics = self.generate_subtopics(topic)
        
        results = {
            "topic": topic,
            "subtopics": {}
        }
        
        for subtopic in subtopics:
            print(f"\n📌 Researching subtopic: {subtopic}")
            result = self.research(f"{topic}: {subtopic}", depth="medium")
            results["subtopics"][subtopic] = result
        
        # Create overall synthesis
        print("\n🔗 Creating overall synthesis...")
        overall = self.synthesize_deep_dive(topic, results["subtopics"])
        results["overall_synthesis"] = overall
        
        return results
    
    def generate_subtopics(self, topic: str) -> List[str]:
        """Generate relevant subtopics"""
        prompt = f"""Generate 5 key subtopics for researching: {topic}

Subtopics should:
- Cover different aspects
- Be specific and focused
- Be researchable

List:"""
        
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.5
        )
        
        subtopics = response.choices[0].message.content.strip().split('\n')
        return [s.strip('- 0123456789.').strip() for s in subtopics if s.strip()]
    
    def synthesize_deep_dive(self, topic: str, subtopic_results: Dict) -> str:
        """Synthesize results from deep dive"""
        # Combine all syntheses
        combined = f"# Comprehensive Research: {topic}\n\n"
        
        for subtopic, result in subtopic_results.items():
            combined += f"## {subtopic}\n\n"
            combined += result['synthesis'] + "\n\n"
        
        return combined
    
    def fact_check(self, claim: str) -> Dict:
        """Fact-check a specific claim"""
        print(f"🔎 Fact-checking: {claim}\n")
        
        # Search for information about the claim
        results = self.research(claim, depth="medium")
        
        # Cross-reference
        verification = self.verifier.cross_reference(
            claim,
            results['sources']
        )
        
        # Determine verdict
        if verification['confidence'] > 0.7:
            verdict = "Likely True"
        elif verification['confidence'] < 0.3:
            verdict = "Likely False"
        else:
            verdict = "Unclear/Mixed Evidence"
        
        return {
            "claim": claim,
            "verdict": verdict,
            "confidence": verification['confidence'],
            "confirmations": verification['confirmations'],
            "contradictions": verification['contradictions'],
            "sources": results['sources'][:5],
            "explanation": results['synthesis']
        }

# Usage
agent = ResearchAgent()

# Basic research
result = agent.research("What are the latest developments in AI agents?")
print(result['synthesis'])
print(f"\nSources: {result['source_count']}")

# Deep dive
deep_result = agent.deep_dive(
    "AI Agent Architectures",
    subtopics=["ReAct Pattern", "Memory Systems", "Tool Use"]
)

# Fact check
fact_result = agent.fact_check("AI agents can autonomously write production code")
print(f"Verdict: {fact_result['verdict']}")
print(f"Confidence: {fact_result['confidence']:.0%}")

Best Practices

Multi-source verification: Never rely on single source
Assess credibility: Check source reliability
Cite properly: Always attribute information
Check recency: Ensure information is current
Cross-reference: Verify claims across sources
Note contradictions: Highlight conflicting information
Maintain objectivity: Present balanced view
Track sources: Keep detailed records
Update regularly: Refresh research periodically
Human review: Critical research needs expert review

Next Steps

You now have comprehensive knowledge of research agents! Next, we’ll explore task automation agents that handle repetitive workflows.

Task Automation Agents

Introduction to Task Automation

Task automation agents handle repetitive workflows, orchestrate complex processes, and integrate with existing tools to save time and reduce errors.

What Makes Automation Agents Special?

Core Capabilities:

Workflow orchestration
Event-driven triggers
Integration with multiple tools
Scheduled operations
Error handling and recovery
State management across tasks

Key Benefits:

Eliminate repetitive work
Reduce human error
24/7 operation
Consistent execution
Scalable processing
Audit trails

Types of Automation Agents

Workflow Agents: Multi-step process automation
Scheduling Agents: Time-based task execution
Integration Agents: Connect different systems
Monitoring Agents: Watch and respond to events
Data Processing Agents: ETL and transformation

Workflow Orchestration

Building Workflow Engine

from dataclasses import dataclass
from typing import List, Dict, Callable, Any
from enum import Enum
import time

class TaskStatus(Enum):
    PENDING = "pending"
    RUNNING = "running"
    COMPLETED = "completed"
    FAILED = "failed"
    SKIPPED = "skipped"

@dataclass
class Task:
    """Single task in workflow"""
    id: str
    name: str
    action: Callable
    params: Dict[str, Any]
    dependencies: List[str] = None
    retry_count: int = 3
    timeout: int = 300
    status: TaskStatus = TaskStatus.PENDING
    result: Any = None
    error: str = None

class WorkflowEngine:
    """Orchestrate complex workflows"""
    
    def __init__(self):
        self.tasks = {}
        self.execution_log = []
    
    def add_task(self, task: Task):
        """Add task to workflow"""
        self.tasks[task.id] = task
    
    def execute_workflow(self) -> Dict:
        """Execute all tasks respecting dependencies"""
        print("🚀 Starting workflow execution\n")
        
        completed = set()
        failed = set()
        
        while len(completed) + len(failed) < len(self.tasks):
            # Find tasks ready to execute
            ready_tasks = self.get_ready_tasks(completed, failed)
            
            if not ready_tasks:
                # Check if we're stuck
                pending = [t for t in self.tasks.values() if t.status == TaskStatus.PENDING]
                if pending:
                    print("⚠️  Workflow stuck - circular dependencies or all tasks failed")
                    break
                else:
                    break
            
            # Execute ready tasks
            for task in ready_tasks:
                result = self.execute_task(task)
                
                if result['success']:
                    completed.add(task.id)
                else:
                    failed.add(task.id)
        
        return self.generate_report(completed, failed)
    
    def get_ready_tasks(self, completed: set, failed: set) -> List[Task]:
        """Get tasks ready to execute"""
        ready = []
        
        for task in self.tasks.values():
            if task.status != TaskStatus.PENDING:
                continue
            
            # Check dependencies
            if task.dependencies:
                deps_met = all(dep in completed for dep in task.dependencies)
                deps_failed = any(dep in failed for dep in task.dependencies)
                
                if deps_failed:
                    task.status = TaskStatus.SKIPPED
                    task.error = "Dependency failed"
                    continue
                
                if not deps_met:
                    continue
            
            ready.append(task)
        
        return ready
    
    def execute_task(self, task: Task) -> Dict:
        """Execute single task with retry logic"""
        print(f"▶️  Executing: {task.name}")
        task.status = TaskStatus.RUNNING
        
        for attempt in range(task.retry_count):
            try:
                # Execute task action
                start_time = time.time()
                result = task.action(**task.params)
                execution_time = time.time() - start_time
                
                # Success
                task.status = TaskStatus.COMPLETED
                task.result = result
                
                log_entry = {
                    "task_id": task.id,
                    "task_name": task.name,
                    "status": "success",
                    "execution_time": execution_time,
                    "attempt": attempt + 1
                }
                self.execution_log.append(log_entry)
                
                print(f"✅ Completed: {task.name} ({execution_time:.2f}s)\n")
                
                return {"success": True, "result": result}
                
            except Exception as e:
                error_msg = str(e)
                print(f"❌ Attempt {attempt + 1} failed: {error_msg}")
                
                if attempt < task.retry_count - 1:
                    wait_time = 2 ** attempt  # Exponential backoff
                    print(f"⏳ Retrying in {wait_time}s...")
                    time.sleep(wait_time)
                else:
                    # Final failure
                    task.status = TaskStatus.FAILED
                    task.error = error_msg
                    
                    log_entry = {
                        "task_id": task.id,
                        "task_name": task.name,
                        "status": "failed",
                        "error": error_msg,
                        "attempts": task.retry_count
                    }
                    self.execution_log.append(log_entry)
                    
                    print(f"💥 Failed: {task.name}\n")
                    
                    return {"success": False, "error": error_msg}
    
    def generate_report(self, completed: set, failed: set) -> Dict:
        """Generate execution report"""
        total = len(self.tasks)
        skipped = sum(1 for t in self.tasks.values() if t.status == TaskStatus.SKIPPED)
        
        report = {
            "total_tasks": total,
            "completed": len(completed),
            "failed": len(failed),
            "skipped": skipped,
            "success_rate": len(completed) / total if total > 0 else 0,
            "execution_log": self.execution_log
        }
        
        print("=" * 50)
        print("WORKFLOW EXECUTION REPORT")
        print("=" * 50)
        print(f"Total Tasks: {total}")
        print(f"Completed: {len(completed)}")
        print(f"Failed: {len(failed)}")
        print(f"Skipped: {skipped}")
        print(f"Success Rate: {report['success_rate']:.1%}")
        print("=" * 50)
        
        return report

# Usage
workflow = WorkflowEngine()

# Define tasks
def fetch_data(source):
    print(f"  Fetching from {source}...")
    time.sleep(1)
    return {"data": f"Data from {source}"}

def process_data(data):
    print(f"  Processing data...")
    time.sleep(1)
    return {"processed": True}

def save_results(data):
    print(f"  Saving results...")
    time.sleep(1)
    return {"saved": True}

# Add tasks
workflow.add_task(Task(
    id="fetch",
    name="Fetch Data",
    action=fetch_data,
    params={"source": "API"}
))

workflow.add_task(Task(
    id="process",
    name="Process Data",
    action=process_data,
    params={"data": {}},
    dependencies=["fetch"]
))

workflow.add_task(Task(
    id="save",
    name="Save Results",
    action=save_results,
    params={"data": {}},
    dependencies=["process"]
))

# Execute
report = workflow.execute_workflow()

Parallel Workflow Execution

import asyncio
from concurrent.futures import ThreadPoolExecutor

class ParallelWorkflowEngine(WorkflowEngine):
    """Execute independent tasks in parallel"""
    
    def __init__(self, max_workers: int = 4):
        super().__init__()
        self.max_workers = max_workers
        self.executor = ThreadPoolExecutor(max_workers=max_workers)
    
    async def execute_workflow_async(self) -> Dict:
        """Execute workflow with parallel execution"""
        print("🚀 Starting parallel workflow execution\n")
        
        completed = set()
        failed = set()
        
        while len(completed) + len(failed) < len(self.tasks):
            # Get ready tasks
            ready_tasks = self.get_ready_tasks(completed, failed)
            
            if not ready_tasks:
                break
            
            # Execute tasks in parallel
            tasks_futures = [
                self.execute_task_async(task)
                for task in ready_tasks
            ]
            
            results = await asyncio.gather(*tasks_futures)
            
            # Update completed/failed
            for task, result in zip(ready_tasks, results):
                if result['success']:
                    completed.add(task.id)
                else:
                    failed.add(task.id)
        
        return self.generate_report(completed, failed)
    
    async def execute_task_async(self, task: Task) -> Dict:
        """Execute task asynchronously"""
        loop = asyncio.get_event_loop()
        return await loop.run_in_executor(
            self.executor,
            self.execute_task,
            task
        )

# Usage
async def main():
    workflow = ParallelWorkflowEngine(max_workers=3)
    
    # Add independent tasks that can run in parallel
    for i in range(5):
        workflow.add_task(Task(
            id=f"task_{i}",
            name=f"Task {i}",
            action=lambda x: time.sleep(1) or f"Result {x}",
            params={"x": i}
        ))
    
    report = await workflow.execute_workflow_async()

# Run
# asyncio.run(main())

Scheduled Operations

Task Scheduler

from datetime import datetime, timedelta
import schedule
import threading

class TaskScheduler:
    """Schedule tasks to run at specific times"""
    
    def __init__(self):
        self.scheduled_tasks = []
        self.running = False
        self.thread = None
    
    def schedule_task(self, 
                     task: Callable,
                     schedule_type: str,
                     time_spec: str = None,
                     **kwargs):
        """Schedule a task"""
        
        if schedule_type == "daily":
            job = schedule.every().day.at(time_spec).do(task, **kwargs)
        
        elif schedule_type == "hourly":
            job = schedule.every().hour.do(task, **kwargs)
        
        elif schedule_type == "interval":
            minutes = int(time_spec)
            job = schedule.every(minutes).minutes.do(task, **kwargs)
        
        elif schedule_type == "weekly":
            day, time = time_spec.split()
            job = getattr(schedule.every(), day.lower()).at(time).do(task, **kwargs)
        
        else:
            raise ValueError(f"Unknown schedule type: {schedule_type}")
        
        self.scheduled_tasks.append({
            "job": job,
            "task": task.__name__,
            "schedule": schedule_type,
            "time_spec": time_spec
        })
        
        print(f"📅 Scheduled: {task.__name__} - {schedule_type} {time_spec or ''}")
    
    def start(self):
        """Start scheduler"""
        self.running = True
        self.thread = threading.Thread(target=self._run_scheduler)
        self.thread.daemon = True
        self.thread.start()
        print("🕐 Scheduler started")
    
    def stop(self):
        """Stop scheduler"""
        self.running = False
        if self.thread:
            self.thread.join()
        print("🛑 Scheduler stopped")
    
    def _run_scheduler(self):
        """Run scheduler loop"""
        while self.running:
            schedule.run_pending()
            time.sleep(1)
    
    def list_scheduled_tasks(self) -> List[Dict]:
        """List all scheduled tasks"""
        return self.scheduled_tasks

# Usage
scheduler = TaskScheduler()

def backup_database():
    print(f"💾 Running database backup at {datetime.now()}")
    # Backup logic here

def send_report():
    print(f"📊 Sending daily report at {datetime.now()}")
    # Report logic here

def cleanup_temp_files():
    print(f"🧹 Cleaning temp files at {datetime.now()}")
    # Cleanup logic here

# Schedule tasks
scheduler.schedule_task(backup_database, "daily", "02:00")
scheduler.schedule_task(send_report, "daily", "09:00")
scheduler.schedule_task(cleanup_temp_files, "interval", "60")  # Every hour

# Start scheduler
scheduler.start()

# Keep running
# try:
#     while True:
#         time.sleep(1)
# except KeyboardInterrupt:
#     scheduler.stop()

Cron-Style Scheduling

from crontab import CronTab

class CronScheduler:
    """Cron-style task scheduling"""
    
    def __init__(self):
        self.cron = CronTab(user=True)
    
    def add_cron_job(self, 
                     command: str,
                     schedule: str,
                     comment: str = None):
        """Add cron job
        
        Schedule format: "minute hour day month weekday"
        Examples:
        - "0 2 * * *" - Daily at 2 AM
        - "*/15 * * * *" - Every 15 minutes
        - "0 9 * * 1-5" - Weekdays at 9 AM
        """
        job = self.cron.new(command=command, comment=comment)
        job.setall(schedule)
        self.cron.write()
        
        print(f"✅ Added cron job: {comment or command}")
        print(f"   Schedule: {schedule}")
    
    def list_jobs(self) -> List[Dict]:
        """List all cron jobs"""
        jobs = []
        for job in self.cron:
            jobs.append({
                "command": job.command,
                "schedule": str(job.slices),
                "comment": job.comment,
                "enabled": job.is_enabled()
            })
        return jobs
    
    def remove_job(self, comment: str):
        """Remove job by comment"""
        self.cron.remove_all(comment=comment)
        self.cron.write()
        print(f"🗑️  Removed job: {comment}")

# Usage
# cron = CronScheduler()
# cron.add_cron_job(
#     "python /path/to/backup.py",
#     "0 2 * * *",
#     "Daily backup"
# )

Event-Driven Triggers

Event Listener System

from typing import Callable, Dict, List
from queue import Queue
import threading

class EventType(Enum):
    FILE_CREATED = "file_created"
    FILE_MODIFIED = "file_modified"
    FILE_DELETED = "file_deleted"
    API_CALL = "api_call"
    THRESHOLD_EXCEEDED = "threshold_exceeded"
    ERROR_OCCURRED = "error_occurred"

@dataclass
class Event:
    """Event data"""
    type: EventType
    data: Dict[str, Any]
    timestamp: float = None
    
    def __post_init__(self):
        if self.timestamp is None:
            self.timestamp = time.time()

class EventDrivenAgent:
    """Agent that responds to events"""
    
    def __init__(self):
        self.handlers = {}
        self.event_queue = Queue()
        self.running = False
        self.thread = None
    
    def register_handler(self, event_type: EventType, handler: Callable):
        """Register event handler"""
        if event_type not in self.handlers:
            self.handlers[event_type] = []
        
        self.handlers[event_type].append(handler)
        print(f"📝 Registered handler for {event_type.value}")
    
    def emit_event(self, event: Event):
        """Emit an event"""
        self.event_queue.put(event)
    
    def start(self):
        """Start event processing"""
        self.running = True
        self.thread = threading.Thread(target=self._process_events)
        self.thread.daemon = True
        self.thread.start()
        print("🎯 Event processor started")
    
    def stop(self):
        """Stop event processing"""
        self.running = False
        if self.thread:
            self.thread.join()
        print("🛑 Event processor stopped")
    
    def _process_events(self):
        """Process events from queue"""
        while self.running:
            try:
                event = self.event_queue.get(timeout=1)
                self._handle_event(event)
            except:
                continue
    
    def _handle_event(self, event: Event):
        """Handle single event"""
        print(f"⚡ Event: {event.type.value}")
        
        handlers = self.handlers.get(event.type, [])
        
        for handler in handlers:
            try:
                handler(event)
            except Exception as e:
                print(f"❌ Handler error: {e}")

# Usage
agent = EventDrivenAgent()

# Register handlers
def on_file_created(event: Event):
    print(f"  📄 File created: {event.data['filename']}")
    # Process new file

def on_threshold_exceeded(event: Event):
    print(f"  ⚠️  Threshold exceeded: {event.data['metric']} = {event.data['value']}")
    # Send alert

def on_error(event: Event):
    print(f"  💥 Error occurred: {event.data['error']}")
    # Log and notify

agent.register_handler(EventType.FILE_CREATED, on_file_created)
agent.register_handler(EventType.THRESHOLD_EXCEEDED, on_threshold_exceeded)
agent.register_handler(EventType.ERROR_OCCURRED, on_error)

# Start processing
agent.start()

# Emit events
agent.emit_event(Event(
    type=EventType.FILE_CREATED,
    data={"filename": "data.csv"}
))

agent.emit_event(Event(
    type=EventType.THRESHOLD_EXCEEDED,
    data={"metric": "cpu_usage", "value": 95}
))

File System Watcher

from watchdog.observers import Observer
from watchdog.events import FileSystemEventHandler

class FileWatcher(FileSystemEventHandler):
    """Watch file system for changes"""
    
    def __init__(self, agent: EventDrivenAgent):
        self.agent = agent
    
    def on_created(self, event):
        """File created"""
        if not event.is_directory:
            self.agent.emit_event(Event(
                type=EventType.FILE_CREATED,
                data={"path": event.src_path}
            ))
    
    def on_modified(self, event):
        """File modified"""
        if not event.is_directory:
            self.agent.emit_event(Event(
                type=EventType.FILE_MODIFIED,
                data={"path": event.src_path}
            ))
    
    def on_deleted(self, event):
        """File deleted"""
        if not event.is_directory:
            self.agent.emit_event(Event(
                type=EventType.FILE_DELETED,
                data={"path": event.src_path}
            ))

def start_file_watcher(path: str, agent: EventDrivenAgent):
    """Start watching directory"""
    event_handler = FileWatcher(agent)
    observer = Observer()
    observer.schedule(event_handler, path, recursive=True)
    observer.start()
    print(f"👁️  Watching: {path}")
    return observer

# Usage
# observer = start_file_watcher("/path/to/watch", agent)

Integration with Existing Tools

Tool Integration Framework

class ToolIntegration:
    """Integrate with external tools"""
    
    def __init__(self):
        self.tools = {}
    
    def register_tool(self, name: str, connector: Callable):
        """Register tool connector"""
        self.tools[name] = connector
        print(f"🔌 Registered tool: {name}")
    
    def execute_tool(self, name: str, action: str, **params) -> Dict:
        """Execute tool action"""
        if name not in self.tools:
            return {"success": False, "error": f"Tool not found: {name}"}
        
        try:
            result = self.tools[name](action, **params)
            return {"success": True, "result": result}
        except Exception as e:
            return {"success": False, "error": str(e)}

# Example integrations

def slack_connector(action: str, **params):
    """Slack integration"""
    if action == "send_message":
        channel = params.get("channel")
        message = params.get("message")
        # Send to Slack API
        print(f"📱 Slack: Sending to {channel}: {message}")
        return {"sent": True}
    
    elif action == "get_messages":
        channel = params.get("channel")
        # Get from Slack API
        return {"messages": []}

def email_connector(action: str, **params):
    """Email integration"""
    if action == "send":
        to = params.get("to")
        subject = params.get("subject")
        body = params.get("body")
        # Send email
        print(f"📧 Email: Sending to {to}")
        return {"sent": True}

def database_connector(action: str, **params):
    """Database integration"""
    if action == "query":
        sql = params.get("sql")
        # Execute query
        print(f"🗄️  Database: Executing query")
        return {"rows": []}
    
    elif action == "insert":
        table = params.get("table")
        data = params.get("data")
        # Insert data
        return {"inserted": True}

# Setup
integrations = ToolIntegration()
integrations.register_tool("slack", slack_connector)
integrations.register_tool("email", email_connector)
integrations.register_tool("database", database_connector)

# Use
integrations.execute_tool(
    "slack",
    "send_message",
    channel="#general",
    message="Task completed!"
)

Complete Automation Agent

class AutomationAgent:
    """Complete task automation agent"""
    
    def __init__(self):
        self.workflow_engine = WorkflowEngine()
        self.scheduler = TaskScheduler()
        self.event_agent = EventDrivenAgent()
        self.integrations = ToolIntegration()
        self.client = openai.OpenAI()
    
    def create_automation(self, description: str) -> Dict:
        """Create automation from natural language"""
        
        # Parse description to understand automation
        automation_spec = self.parse_automation_description(description)
        
        # Create workflow
        workflow_id = self.create_workflow(automation_spec)
        
        # Setup triggers
        if automation_spec.get("trigger_type") == "schedule":
            self.setup_scheduled_trigger(workflow_id, automation_spec)
        elif automation_spec.get("trigger_type") == "event":
            self.setup_event_trigger(workflow_id, automation_spec)
        
        return {
            "workflow_id": workflow_id,
            "automation_spec": automation_spec,
            "status": "active"
        }
    
    def parse_automation_description(self, description: str) -> Dict:
        """Parse natural language automation description"""
        prompt = f"""Parse this automation request into a structured specification:

"{description}"

Provide JSON with:
- trigger_type: "schedule" or "event"
- trigger_spec: schedule time or event type
- steps: list of actions to perform
- integrations: tools needed

Specification:"""
        
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.2
        )
        
        import json
        return json.loads(response.choices[0].message.content)
    
    def create_workflow(self, spec: Dict) -> str:
        """Create workflow from specification"""
        workflow_id = f"workflow_{int(time.time())}"
        
        for i, step in enumerate(spec.get("steps", [])):
            task = Task(
                id=f"{workflow_id}_step_{i}",
                name=step.get("name"),
                action=self.create_action_from_spec(step),
                params=step.get("params", {}),
                dependencies=step.get("dependencies", [])
            )
            self.workflow_engine.add_task(task)
        
        return workflow_id
    
    def create_action_from_spec(self, step_spec: Dict) -> Callable:
        """Create executable action from step specification"""
        action_type = step_spec.get("action_type")
        
        if action_type == "api_call":
            def action(**params):
                return self.integrations.execute_tool(
                    step_spec["tool"],
                    step_spec["action"],
                    **params
                )
            return action
        
        elif action_type == "data_processing":
            def action(**params):
                # Process data
                return {"processed": True}
            return action
        
        else:
            def action(**params):
                print(f"Executing: {step_spec.get('name')}")
                return {"done": True}
            return action
    
    def setup_scheduled_trigger(self, workflow_id: str, spec: Dict):
        """Setup scheduled trigger for workflow"""
        def run_workflow():
            print(f"🔄 Running scheduled workflow: {workflow_id}")
            self.workflow_engine.execute_workflow()
        
        self.scheduler.schedule_task(
            run_workflow,
            spec["trigger_spec"]["type"],
            spec["trigger_spec"]["time"]
        )
    
    def setup_event_trigger(self, workflow_id: str, spec: Dict):
        """Setup event trigger for workflow"""
        event_type = EventType[spec["trigger_spec"]["event"]]
        
        def on_event(event: Event):
            print(f"🎯 Event triggered workflow: {workflow_id}")
            self.workflow_engine.execute_workflow()
        
        self.event_agent.register_handler(event_type, on_event)

# Usage
agent = AutomationAgent()

# Create automation from description
automation = agent.create_automation("""
Every day at 9 AM:
1. Fetch data from the API
2. Process and analyze the data
3. Generate a report
4. Send the report via email to team@company.com
""")

print(f"Created automation: {automation['workflow_id']}")

Best Practices

Idempotency: Tasks should be safely re-runnable
Error handling: Always handle failures gracefully
Logging: Track all automation executions
Monitoring: Alert on failures
Testing: Test workflows before production
Documentation: Document automation logic
Versioning: Track automation changes
Rollback: Ability to revert changes
Rate limiting: Don’t overwhelm systems
Security: Secure credentials and access

Practice Exercises

Exercise 1: Email Automation Agent (Medium)

Task: Build an agent that processes emails and takes actions.

Click to see solution

class EmailAgent:
    def process_email(self, email: Dict) -> Dict:
        # Classify email
        category = self.classify(email["subject"])
        
        # Route based on category
        if category == "urgent":
            return self.escalate(email)
        elif category == "question":
            return self.auto_respond(email)
        else:
            return self.archive(email)

Exercise 2: Workflow Orchestrator (Hard)

Task: Create an orchestrator that manages complex multi-step workflows.

Click to see solution

class WorkflowOrchestrator:
    def execute_workflow(self, workflow: Dict) -> Dict:
        results = {}
        for step in workflow["steps"]:
            if self.check_conditions(step, results):
                result = self.execute_step(step)
                results[step["id"]] = result
        return results

✅ Chapter 6 Summary

You’ve mastered specialized agent types:

Coding Agents: Analyze, generate, refactor, and test code

Research Agents: Multi-source search, verification, and synthesis

Automation Agents: Workflow orchestration, scheduling, and event-driven tasks

These specialized agents demonstrate how to focus agent capabilities on specific domains for maximum effectiveness.

Next Steps

Chapter 6 (Specialized Agent Types) is complete! You now have deep knowledge of coding agents, research agents, and task automation agents. These specialized agents form the foundation for building powerful, domain-specific AI systems.

Agent Learning & Adaptation

Module 7: Learning Objectives

By the end of this module, you will:

✓ Implement few-shot and RLHF learning strategies
✓ Build multimodal agents processing vision and audio
✓ Master LangChain, LangGraph, and other frameworks
✓ Design custom agentic frameworks
✓ Enable continuous learning and adaptation

Introduction to Agent Learning

Learning and adaptation enable agents to improve over time, personalize to users, and handle new situations without explicit reprogramming.

Why Learning Matters

Benefits:

Improved performance over time
Personalization to individual users
Adaptation to changing environments
Reduced need for manual updates
Discovery of better strategies

Challenges:

Avoiding catastrophic forgetting
Balancing exploration vs exploitation
Ensuring safe learning
Managing computational costs
Maintaining consistency

Types of Learning

Few-Shot Learning: Learn from minimal examples
Reinforcement Learning: Learn from feedback
Continuous Learning: Ongoing improvement
Transfer Learning: Apply knowledge to new domains
Meta-Learning: Learn how to learn

Few-Shot Learning

In-Context Learning

from typing import List, Dict
import openai

class FewShotLearner:
    """Learn from few examples in context"""
    
    def __init__(self):
        self.client = openai.OpenAI()
        self.examples = []
    
    def add_example(self, input_text: str, output_text: str, explanation: str = None):
        """Add training example"""
        example = {
            "input": input_text,
            "output": output_text,
            "explanation": explanation
        }
        self.examples.append(example)
        print(f"✅ Added example: {input_text[:50]}...")
    
    def learn_from_examples(self, examples: List[Dict]):
        """Batch add examples"""
        for ex in examples:
            self.add_example(ex["input"], ex["output"], ex.get("explanation"))
    
    def predict(self, input_text: str, temperature: float = 0.3) -> str:
        """Make prediction using learned examples"""
        
        # Build prompt with examples
        prompt = self.build_few_shot_prompt(input_text)
        
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}],
            temperature=temperature
        )
        
        return response.choices[0].message.content
    
    def build_few_shot_prompt(self, input_text: str) -> str:
        """Build prompt with examples"""
        prompt = "Learn from these examples:\n\n"
        
        for i, example in enumerate(self.examples, 1):
            prompt += f"Example {i}:\n"
            prompt += f"Input: {example['input']}\n"
            prompt += f"Output: {example['output']}\n"
            if example.get('explanation'):
                prompt += f"Why: {example['explanation']}\n"
            prompt += "\n"
        
        prompt += f"Now apply what you learned:\n"
        prompt += f"Input: {input_text}\n"
        prompt += f"Output:"
        
        return prompt
    
    def evaluate(self, test_cases: List[Dict]) -> Dict:
        """Evaluate performance on test cases"""
        correct = 0
        total = len(test_cases)
        
        for test in test_cases:
            prediction = self.predict(test["input"])
            expected = test["output"]
            
            # Simple exact match (can be more sophisticated)
            if prediction.strip().lower() == expected.strip().lower():
                correct += 1
        
        accuracy = correct / total if total > 0 else 0
        
        return {
            "accuracy": accuracy,
            "correct": correct,
            "total": total
        }

# Usage
learner = FewShotLearner()

# Teach sentiment analysis
learner.add_example(
    "This product is amazing!",
    "positive",
    "Enthusiastic language indicates positive sentiment"
)
learner.add_example(
    "Terrible experience, very disappointed",
    "negative",
    "Words like 'terrible' and 'disappointed' indicate negative sentiment"
)
learner.add_example(
    "It's okay, nothing special",
    "neutral",
    "Lukewarm language indicates neutral sentiment"
)

# Test
result = learner.predict("I love this so much!")
print(f"Prediction: {result}")

# Evaluate
test_cases = [
    {"input": "Best purchase ever!", "output": "positive"},
    {"input": "Waste of money", "output": "negative"},
    {"input": "It works fine", "output": "neutral"}
]
evaluation = learner.evaluate(test_cases)
print(f"Accuracy: {evaluation['accuracy']:.1%}")

Dynamic Example Selection

import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

class AdaptiveFewShotLearner(FewShotLearner):
    """Select most relevant examples dynamically"""
    
    def __init__(self, max_examples: int = 5):
        super().__init__()
        self.max_examples = max_examples
        self.example_embeddings = []
    
    def add_example(self, input_text: str, output_text: str, explanation: str = None):
        """Add example with embedding"""
        super().add_example(input_text, output_text, explanation)
        
        # Get embedding
        embedding = self.get_embedding(input_text)
        self.example_embeddings.append(embedding)
    
    def get_embedding(self, text: str) -> np.ndarray:
        """Get text embedding"""
        response = self.client.embeddings.create(
            model="text-embedding-3-small",
            input=text
        )
        return np.array(response.data[0].embedding)
    
    def select_relevant_examples(self, input_text: str) -> List[Dict]:
        """Select most relevant examples for input"""
        if not self.examples:
            return []
        
        # Get input embedding
        input_embedding = self.get_embedding(input_text)
        
        # Calculate similarities
        similarities = []
        for i, example_embedding in enumerate(self.example_embeddings):
            similarity = cosine_similarity(
                input_embedding.reshape(1, -1),
                example_embedding.reshape(1, -1)
            )[0][0]
            similarities.append((i, similarity))
        
        # Sort by similarity
        similarities.sort(key=lambda x: x[1], reverse=True)
        
        # Select top examples
        selected_indices = [idx for idx, _ in similarities[:self.max_examples]]
        selected_examples = [self.examples[i] for i in selected_indices]
        
        return selected_examples
    
    def predict(self, input_text: str, temperature: float = 0.3) -> str:
        """Predict using most relevant examples"""
        # Select relevant examples
        relevant_examples = self.select_relevant_examples(input_text)
        
        # Temporarily use only relevant examples
        original_examples = self.examples
        self.examples = relevant_examples
        
        # Make prediction
        result = super().predict(input_text, temperature)
        
        # Restore all examples
        self.examples = original_examples
        
        return result

# Usage
adaptive_learner = AdaptiveFewShotLearner(max_examples=3)

# Add many examples
examples = [
    ("Great product!", "positive"),
    ("Horrible quality", "negative"),
    ("Works as expected", "neutral"),
    ("Absolutely love it!", "positive"),
    ("Complete waste", "negative"),
    ("It's fine", "neutral"),
]

for inp, out in examples:
    adaptive_learner.add_example(inp, out)

# Predict - will use most relevant examples
result = adaptive_learner.predict("This is fantastic!")
print(f"Prediction: {result}")

Reinforcement Learning from Feedback

Human Feedback Collection

from dataclasses import dataclass
from typing import Optional
import time

@dataclass
class Feedback:
    """User feedback on agent response"""
    response_id: str
    rating: int  # 1-5
    comment: Optional[str] = None
    timestamp: float = None
    
    def __post_init__(self):
        if self.timestamp is None:
            self.timestamp = time.time()

class FeedbackCollector:
    """Collect and manage user feedback"""
    
    def __init__(self):
        self.feedback_history = []
        self.response_cache = {}
    
    def record_response(self, response_id: str, prompt: str, response: str):
        """Record agent response"""
        self.response_cache[response_id] = {
            "prompt": prompt,
            "response": response,
            "timestamp": time.time()
        }
    
    def collect_feedback(self, response_id: str, rating: int, comment: str = None) -> Feedback:
        """Collect feedback on response"""
        feedback = Feedback(
            response_id=response_id,
            rating=rating,
            comment=comment
        )
        
        self.feedback_history.append(feedback)
        print(f"📝 Feedback recorded: {rating}/5")
        
        return feedback
    
    def get_average_rating(self) -> float:
        """Get average rating"""
        if not self.feedback_history:
            return 0.0
        
        total = sum(f.rating for f in self.feedback_history)
        return total / len(self.feedback_history)
    
    def get_positive_examples(self, threshold: int = 4) -> List[Dict]:
        """Get highly-rated examples"""
        positive = []
        
        for feedback in self.feedback_history:
            if feedback.rating >= threshold:
                response_data = self.response_cache.get(feedback.response_id)
                if response_data:
                    positive.append({
                        "prompt": response_data["prompt"],
                        "response": response_data["response"],
                        "rating": feedback.rating
                    })
        
        return positive
    
    def get_negative_examples(self, threshold: int = 2) -> List[Dict]:
        """Get poorly-rated examples"""
        negative = []
        
        for feedback in self.feedback_history:
            if feedback.rating <= threshold:
                response_data = self.response_cache.get(feedback.response_id)
                if response_data:
                    negative.append({
                        "prompt": response_data["prompt"],
                        "response": response_data["response"],
                        "rating": feedback.rating,
                        "comment": feedback.comment
                    })
        
        return negative

# Usage
collector = FeedbackCollector()

# Record response
response_id = "resp_001"
collector.record_response(
    response_id,
    "What is Python?",
    "Python is a programming language..."
)

# Collect feedback
collector.collect_feedback(response_id, 5, "Very helpful!")

# Get positive examples for learning
positive_examples = collector.get_positive_examples()
print(f"Positive examples: {len(positive_examples)}")

Learning from Feedback

class RLFHAgent:
    """Agent that learns from human feedback"""
    
    def __init__(self):
        self.client = openai.OpenAI()
        self.feedback_collector = FeedbackCollector()
        self.learner = AdaptiveFewShotLearner()
    
    def respond(self, prompt: str, response_id: str = None) -> str:
        """Generate response"""
        if response_id is None:
            response_id = f"resp_{int(time.time())}"
        
        # Use learned examples
        positive_examples = self.feedback_collector.get_positive_examples()
        
        # Build prompt with positive examples
        enhanced_prompt = self.build_prompt_with_examples(prompt, positive_examples)
        
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": enhanced_prompt}],
            temperature=0.7
        )
        
        response_text = response.choices[0].message.content
        
        # Record for feedback
        self.feedback_collector.record_response(response_id, prompt, response_text)
        
        return response_text
    
    def build_prompt_with_examples(self, prompt: str, examples: List[Dict]) -> str:
        """Build prompt incorporating learned examples"""
        if not examples:
            return prompt
        
        enhanced = "Here are examples of good responses:\n\n"
        
        for ex in examples[:5]:  # Use top 5
            enhanced += f"Q: {ex['prompt']}\n"
            enhanced += f"A: {ex['response']}\n\n"
        
        enhanced += f"Now respond to:\nQ: {prompt}\nA:"
        
        return enhanced
    
    def learn_from_feedback(self, response_id: str, rating: int, comment: str = None):
        """Learn from user feedback"""
        feedback = self.feedback_collector.collect_feedback(response_id, rating, comment)
        
        # If positive, add to examples
        if rating >= 4:
            response_data = self.feedback_collector.response_cache.get(response_id)
            if response_data:
                self.learner.add_example(
                    response_data["prompt"],
                    response_data["response"],
                    f"User rated {rating}/5"
                )
                print("✅ Learned from positive feedback")
        
        # If negative, analyze and improve
        elif rating <= 2:
            self.analyze_negative_feedback(response_id, comment)
    
    def analyze_negative_feedback(self, response_id: str, comment: str):
        """Analyze negative feedback to improve"""
        response_data = self.feedback_collector.response_cache.get(response_id)
        if not response_data:
            return
        
        prompt = f"""Analyze this negative feedback:

Original prompt: {response_data['prompt']}
Response: {response_data['response']}
User feedback: {comment}

What went wrong and how to improve?"""
        
        analysis = self.client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}]
        )
        
        print(f"📊 Analysis: {analysis.choices[0].message.content[:200]}...")
    
    def get_performance_metrics(self) -> Dict:
        """Get learning performance metrics"""
        avg_rating = self.feedback_collector.get_average_rating()
        total_feedback = len(self.feedback_collector.feedback_history)
        positive_count = len(self.feedback_collector.get_positive_examples())
        
        return {
            "average_rating": avg_rating,
            "total_feedback": total_feedback,
            "positive_examples": positive_count,
            "learned_examples": len(self.learner.examples)
        }

# Usage
agent = RLFHAgent()

# Interact and learn
response_id = "resp_001"
response = agent.respond("Explain machine learning", response_id)
print(f"Response: {response}")

# User provides feedback
agent.learn_from_feedback(response_id, 5, "Clear and concise!")

# Check improvement
metrics = agent.get_performance_metrics()
print(f"Metrics: {metrics}")

Continuous Learning

Online Learning System

class ContinuousLearner:
    """Agent that continuously learns from interactions"""
    
    def __init__(self, memory_size: int = 1000):
        self.client = openai.OpenAI()
        self.memory_size = memory_size
        self.interaction_history = []
        self.performance_history = []
    
    def interact(self, prompt: str) -> Dict:
        """Interact and learn"""
        # Generate response
        response = self.generate_response(prompt)
        
        # Record interaction
        interaction = {
            "prompt": prompt,
            "response": response,
            "timestamp": time.time()
        }
        self.interaction_history.append(interaction)
        
        # Trim history if too large
        if len(self.interaction_history) > self.memory_size:
            self.interaction_history = self.interaction_history[-self.memory_size:]
        
        return {
            "response": response,
            "interaction_id": len(self.interaction_history) - 1
        }
    
    def generate_response(self, prompt: str) -> str:
        """Generate response using learned knowledge"""
        # Get relevant past interactions
        relevant = self.get_relevant_interactions(prompt)
        
        # Build context
        context = self.build_context(relevant)
        
        # Generate
        messages = [
            {"role": "system", "content": context},
            {"role": "user", "content": prompt}
        ]
        
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=messages,
            temperature=0.7
        )
        
        return response.choices[0].message.content
    
    def get_relevant_interactions(self, prompt: str, top_k: int = 5) -> List[Dict]:
        """Get relevant past interactions"""
        if not self.interaction_history:
            return []
        
        # Simple keyword matching (can use embeddings for better results)
        prompt_words = set(prompt.lower().split())
        
        scored = []
        for interaction in self.interaction_history:
            interaction_words = set(interaction["prompt"].lower().split())
            overlap = len(prompt_words & interaction_words)
            scored.append((interaction, overlap))
        
        scored.sort(key=lambda x: x[1], reverse=True)
        return [interaction for interaction, _ in scored[:top_k]]
    
    def build_context(self, relevant_interactions: List[Dict]) -> str:
        """Build context from relevant interactions"""
        if not relevant_interactions:
            return "You are a helpful assistant."
        
        context = "You are a helpful assistant. Here are relevant past interactions:\n\n"
        
        for interaction in relevant_interactions:
            context += f"Q: {interaction['prompt']}\n"
            context += f"A: {interaction['response']}\n\n"
        
        context += "Use this knowledge to inform your response."
        
        return context
    
    def update_from_feedback(self, interaction_id: int, feedback: Dict):
        """Update based on feedback"""
        if interaction_id >= len(self.interaction_history):
            return
        
        interaction = self.interaction_history[interaction_id]
        interaction["feedback"] = feedback
        
        # Track performance
        self.performance_history.append({
            "timestamp": time.time(),
            "rating": feedback.get("rating", 0)
        })
    
    def get_learning_curve(self) -> List[float]:
        """Get performance over time"""
        if not self.performance_history:
            return []
        
        # Calculate moving average
        window = 10
        curve = []
        
        for i in range(len(self.performance_history)):
            start = max(0, i - window + 1)
            window_ratings = [
                p["rating"] for p in self.performance_history[start:i+1]
            ]
            avg = sum(window_ratings) / len(window_ratings)
            curve.append(avg)
        
        return curve

# Usage
learner = ContinuousLearner()

# Continuous interaction
for i in range(10):
    result = learner.interact(f"Question {i}: What is AI?")
    print(f"Response {i}: {result['response'][:50]}...")
    
    # Simulate feedback
    learner.update_from_feedback(result["interaction_id"], {"rating": 4})

# Check learning curve
curve = learner.get_learning_curve()
print(f"Learning curve: {curve}")

Fine-Tuning for Specific Tasks

Preparing Training Data

class FineTuningDataPrep:
    """Prepare data for fine-tuning"""
    
    def __init__(self):
        self.training_data = []
    
    def add_training_example(self, 
                            system_message: str,
                            user_message: str,
                            assistant_message: str):
        """Add training example"""
        example = {
            "messages": [
                {"role": "system", "content": system_message},
                {"role": "user", "content": user_message},
                {"role": "assistant", "content": assistant_message}
            ]
        }
        self.training_data.append(example)
    
    def load_from_feedback(self, feedback_collector: FeedbackCollector, min_rating: int = 4):
        """Load training data from positive feedback"""
        positive_examples = feedback_collector.get_positive_examples(threshold=min_rating)
        
        for example in positive_examples:
            self.add_training_example(
                "You are a helpful assistant.",
                example["prompt"],
                example["response"]
            )
        
        print(f"Loaded {len(positive_examples)} training examples")
    
    def export_jsonl(self, filename: str):
        """Export to JSONL format for fine-tuning"""
        import json
        
        with open(filename, 'w') as f:
            for example in self.training_data:
                f.write(json.dumps(example) + '\n')
        
        print(f"Exported {len(self.training_data)} examples to {filename}")
    
    def validate_data(self) -> Dict:
        """Validate training data quality"""
        if not self.training_data:
            return {"valid": False, "error": "No training data"}
        
        issues = []
        
        for i, example in enumerate(self.training_data):
            # Check structure
            if "messages" not in example:
                issues.append(f"Example {i}: Missing 'messages' field")
                continue
            
            messages = example["messages"]
            
            # Check message count
            if len(messages) < 2:
                issues.append(f"Example {i}: Too few messages")
            
            # Check roles
            roles = [m["role"] for m in messages]
            if "user" not in roles or "assistant" not in roles:
                issues.append(f"Example {i}: Missing required roles")
        
        return {
            "valid": len(issues) == 0,
            "total_examples": len(self.training_data),
            "issues": issues
        }

# Usage
prep = FineTuningDataPrep()

# Add examples
prep.add_training_example(
    "You are a Python expert.",
    "How do I sort a list?",
    "Use the sorted() function or list.sort() method..."
)

# Validate
validation = prep.validate_data()
print(f"Valid: {validation['valid']}")

# Export
prep.export_jsonl("training_data.jsonl")

Transfer Learning

Domain Adaptation

class DomainAdapter:
    """Adapt agent to new domain"""
    
    def __init__(self, base_agent):
        self.base_agent = base_agent
        self.domain_examples = []
        self.client = openai.OpenAI()
    
    def add_domain_knowledge(self, domain: str, examples: List[Dict]):
        """Add domain-specific examples"""
        self.domain_examples.extend(examples)
        print(f"Added {len(examples)} examples for domain: {domain}")
    
    def adapt_response(self, prompt: str, domain: str) -> str:
        """Generate domain-adapted response"""
        # Get domain examples
        domain_context = self.build_domain_context(domain)
        
        # Generate with domain context
        messages = [
            {"role": "system", "content": domain_context},
            {"role": "user", "content": prompt}
        ]
        
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=messages,
            temperature=0.5
        )
        
        return response.choices[0].message.content
    
    def build_domain_context(self, domain: str) -> str:
        """Build context for specific domain"""
        context = f"You are an expert in {domain}.\n\n"
        context += "Domain-specific examples:\n\n"
        
        # Filter examples for this domain
        relevant = [ex for ex in self.domain_examples if ex.get("domain") == domain]
        
        for ex in relevant[:5]:
            context += f"Q: {ex['input']}\n"
            context += f"A: {ex['output']}\n\n"
        
        return context

# Usage
adapter = DomainAdapter(base_agent=None)

# Add medical domain knowledge
medical_examples = [
    {
        "domain": "medical",
        "input": "What is hypertension?",
        "output": "Hypertension is high blood pressure..."
    }
]

adapter.add_domain_knowledge("medical", medical_examples)

# Adapt to medical domain
response = adapter.adapt_response(
    "Explain diabetes",
    domain="medical"
)
print(response)

Meta-Learning

Learning to Learn

class MetaLearner:
    """Learn how to learn new tasks quickly"""
    
    def __init__(self):
        self.client = openai.OpenAI()
        self.task_history = []
        self.learning_strategies = []
    
    def learn_new_task(self, task_description: str, examples: List[Dict]) -> Dict:
        """Learn a new task"""
        print(f"📚 Learning new task: {task_description}")
        
        # Analyze task
        task_analysis = self.analyze_task(task_description, examples)
        
        # Select learning strategy
        strategy = self.select_strategy(task_analysis)
        
        # Apply strategy
        learned_model = self.apply_strategy(strategy, examples)
        
        # Record
        self.task_history.append({
            "description": task_description,
            "analysis": task_analysis,
            "strategy": strategy,
            "examples_count": len(examples)
        })
        
        return {
            "task": task_description,
            "strategy": strategy,
            "model": learned_model
        }
    
    def analyze_task(self, description: str, examples: List[Dict]) -> Dict:
        """Analyze task characteristics"""
        prompt = f"""Analyze this learning task:

Task: {description}

Examples: {len(examples)}
Sample: {examples[0] if examples else 'None'}

Determine:
1. Task type (classification, generation, etc.)
2. Complexity (simple, medium, complex)
3. Required examples (few, many)
4. Best learning approach

Analysis:"""
        
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.3
        )
        
        # Parse analysis (simplified)
        return {
            "type": "classification",
            "complexity": "medium",
            "analysis": response.choices[0].message.content
        }
    
    def select_strategy(self, task_analysis: Dict) -> str:
        """Select learning strategy based on task"""
        complexity = task_analysis.get("complexity", "medium")
        
        if complexity == "simple":
            return "few-shot"
        elif complexity == "medium":
            return "adaptive-few-shot"
        else:
            return "fine-tuning"
    
    def apply_strategy(self, strategy: str, examples: List[Dict]) -> Any:
        """Apply selected learning strategy"""
        if strategy == "few-shot":
            learner = FewShotLearner()
            for ex in examples:
                learner.add_example(ex["input"], ex["output"])
            return learner
        
        elif strategy == "adaptive-few-shot":
            learner = AdaptiveFewShotLearner()
            for ex in examples:
                learner.add_example(ex["input"], ex["output"])
            return learner
        
        else:
            # Would implement fine-tuning
            return None
    
    def get_learning_insights(self) -> Dict:
        """Get insights from learning history"""
        if not self.task_history:
            return {}
        
        strategies_used = {}
        for task in self.task_history:
            strategy = task["strategy"]
            strategies_used[strategy] = strategies_used.get(strategy, 0) + 1
        
        return {
            "total_tasks_learned": len(self.task_history),
            "strategies_used": strategies_used,
            "avg_examples_per_task": sum(t["examples_count"] for t in self.task_history) / len(self.task_history)
        }

# Usage
meta_learner = MetaLearner()

# Learn multiple tasks
tasks = [
    {
        "description": "Sentiment analysis",
        "examples": [
            {"input": "Great!", "output": "positive"},
            {"input": "Terrible", "output": "negative"}
        ]
    },
    {
        "description": "Language detection",
        "examples": [
            {"input": "Hello", "output": "English"},
            {"input": "Bonjour", "output": "French"}
        ]
    }
]

for task in tasks:
    result = meta_learner.learn_new_task(task["description"], task["examples"])
    print(f"Learned using: {result['strategy']}")

# Get insights
insights = meta_learner.get_learning_insights()
print(f"Insights: {insights}")

Best Practices

Start simple: Begin with few-shot learning
Collect feedback: Continuously gather user input
Monitor performance: Track learning metrics
Avoid overfitting: Don’t memorize, generalize
Safe learning: Validate before deploying
Incremental updates: Small, frequent improvements
A/B testing: Compare learned vs baseline
Human oversight: Review learned behaviors
Version control: Track model versions
Rollback capability: Revert if performance degrades

Next Steps

You now understand agent learning and adaptation in depth! Next, we’ll explore multimodal agents that work with images, audio, and other modalities.

Multimodal Agents

Introduction to Multimodal AI

Multimodal agents can process and generate multiple types of data: text, images, audio, video, and more. This enables richer interactions and broader capabilities.

Why Multimodal Matters

Benefits:

Richer understanding of context
More natural interactions
Broader range of tasks
Better accessibility
Cross-modal reasoning

Challenges:

Increased complexity
Higher computational costs
Data alignment across modalities
Quality control
Privacy concerns

Modalities

Vision: Images, videos, screenshots
Audio: Speech, music, sounds
Text: Natural language
Documents: PDFs, spreadsheets
Structured Data: Tables, graphs

Vision and Image Understanding

Image Analysis

import base64
from pathlib import Path
import openai

class VisionAgent:
    """Agent with vision capabilities"""
    
    def __init__(self):
        self.client = openai.OpenAI()
    
    def analyze_image(self, image_path: str, question: str = None) -> str:
        """Analyze image and answer questions"""
        
        # Read and encode image
        with open(image_path, "rb") as image_file:
            image_data = base64.b64encode(image_file.read()).decode('utf-8')
        
        # Determine image type
        ext = Path(image_path).suffix.lower()
        mime_type = {
            '.jpg': 'image/jpeg',
            '.jpeg': 'image/jpeg',
            '.png': 'image/png',
            '.gif': 'image/gif',
            '.webp': 'image/webp'
        }.get(ext, 'image/jpeg')
        
        # Build prompt
        if question:
            prompt = question
        else:
            prompt = "Describe this image in detail."
        
        # Call vision model
        response = self.client.chat.completions.create(
            model="gpt-4-vision-preview",
            messages=[
                {
                    "role": "user",
                    "content": [
                        {"type": "text", "text": prompt},
                        {
                            "type": "image_url",
                            "image_url": {
                                "url": f"data:{mime_type};base64,{image_data}"
                            }
                        }
                    ]
                }
            ],
            max_tokens=500
        )
        
        return response.choices[0].message.content
    
    def extract_text_from_image(self, image_path: str) -> str:
        """Extract text from image (OCR)"""
        return self.analyze_image(
            image_path,
            "Extract all text from this image. Provide the text exactly as it appears."
        )
    
    def describe_scene(self, image_path: str) -> Dict:
        """Get detailed scene description"""
        description = self.analyze_image(
            image_path,
            """Describe this image in detail:
            1. Main subjects
            2. Setting/location
            3. Actions/activities
            4. Colors and mood
            5. Notable details"""
        )
        
        return {"description": description}
    
    def identify_objects(self, image_path: str) -> List[str]:
        """Identify objects in image"""
        result = self.analyze_image(
            image_path,
            "List all objects visible in this image, one per line."
        )
        
        # Parse list
        objects = [line.strip('- ').strip() for line in result.split('\n') if line.strip()]
        return objects
    
    def compare_images(self, image1_path: str, image2_path: str) -> str:
        """Compare two images"""
        
        # Encode both images
        images_data = []
        for path in [image1_path, image2_path]:
            with open(path, "rb") as f:
                data = base64.b64encode(f.read()).decode('utf-8')
                images_data.append(data)
        
        # Compare
        response = self.client.chat.completions.create(
            model="gpt-4-vision-preview",
            messages=[
                {
                    "role": "user",
                    "content": [
                        {"type": "text", "text": "Compare these two images. What are the similarities and differences?"},
                        {
                            "type": "image_url",
                            "image_url": {"url": f"data:image/jpeg;base64,{images_data[0]}"}
                        },
                        {
                            "type": "image_url",
                            "image_url": {"url": f"data:image/jpeg;base64,{images_data[1]}"}
                        }
                    ]
                }
            ],
            max_tokens=500
        )
        
        return response.choices[0].message.content
    
    def answer_visual_question(self, image_path: str, question: str) -> str:
        """Answer specific question about image"""
        return self.analyze_image(image_path, question)

# Usage
vision_agent = VisionAgent()

# Analyze image
description = vision_agent.analyze_image("photo.jpg")
print(f"Description: {description}")

# Extract text (OCR)
text = vision_agent.extract_text_from_image("document.jpg")
print(f"Extracted text: {text}")

# Identify objects
objects = vision_agent.identify_objects("scene.jpg")
print(f"Objects: {objects}")

# Answer question
answer = vision_agent.answer_visual_question(
    "chart.jpg",
    "What is the trend shown in this chart?"
)
print(f"Answer: {answer}")

Image Generation

class ImageGenerator:
    """Generate images from text"""
    
    def __init__(self):
        self.client = openai.OpenAI()
    
    def generate_image(self, 
                      prompt: str,
                      size: str = "1024x1024",
                      quality: str = "standard",
                      n: int = 1) -> List[str]:
        """Generate image from text prompt"""
        
        response = self.client.images.generate(
            model="dall-e-3",
            prompt=prompt,
            size=size,
            quality=quality,
            n=n
        )
        
        # Get URLs
        image_urls = [img.url for img in response.data]
        
        return image_urls
    
    def edit_image(self, 
                   image_path: str,
                   mask_path: str,
                   prompt: str) -> str:
        """Edit image using mask"""
        
        response = self.client.images.edit(
            image=open(image_path, "rb"),
            mask=open(mask_path, "rb"),
            prompt=prompt,
            n=1,
            size="1024x1024"
        )
        
        return response.data[0].url
    
    def create_variation(self, image_path: str, n: int = 1) -> List[str]:
        """Create variations of image"""
        
        response = self.client.images.create_variation(
            image=open(image_path, "rb"),
            n=n,
            size="1024x1024"
        )
        
        return [img.url for img in response.data]

# Usage
generator = ImageGenerator()

# Generate image
urls = generator.generate_image(
    "A futuristic AI agent helping humans",
    quality="hd"
)
print(f"Generated: {urls[0]}")

# Create variations
variations = generator.create_variation("original.png", n=3)
print(f"Created {len(variations)} variations")

Audio Processing

Speech Recognition

class AudioAgent:
    """Agent with audio capabilities"""
    
    def __init__(self):
        self.client = openai.OpenAI()
    
    def transcribe_audio(self, audio_path: str, language: str = None) -> Dict:
        """Transcribe audio to text"""
        
        with open(audio_path, "rb") as audio_file:
            transcript = self.client.audio.transcriptions.create(
                model="whisper-1",
                file=audio_file,
                language=language,
                response_format="verbose_json"
            )
        
        return {
            "text": transcript.text,
            "language": transcript.language,
            "duration": transcript.duration,
            "segments": transcript.segments if hasattr(transcript, 'segments') else []
        }
    
    def translate_audio(self, audio_path: str) -> str:
        """Translate audio to English"""
        
        with open(audio_path, "rb") as audio_file:
            translation = self.client.audio.translations.create(
                model="whisper-1",
                file=audio_file
            )
        
        return translation.text
    
    def transcribe_with_timestamps(self, audio_path: str) -> List[Dict]:
        """Transcribe with word-level timestamps"""
        
        result = self.transcribe_audio(audio_path)
        
        segments = []
        for segment in result.get("segments", []):
            segments.append({
                "start": segment.get("start"),
                "end": segment.get("end"),
                "text": segment.get("text")
            })
        
        return segments

# Usage
audio_agent = AudioAgent()

# Transcribe
result = audio_agent.transcribe_audio("speech.mp3")
print(f"Transcription: {result['text']}")
print(f"Language: {result['language']}")

# Translate
translation = audio_agent.translate_audio("french_audio.mp3")
print(f"Translation: {translation}")

# With timestamps
segments = audio_agent.transcribe_with_timestamps("interview.mp3")
for seg in segments:
    print(f"[{seg['start']:.2f}s - {seg['end']:.2f}s]: {seg['text']}")

Text-to-Speech

class TextToSpeech:
    """Convert text to speech"""
    
    def __init__(self):
        self.client = openai.OpenAI()
    
    def synthesize_speech(self,
                         text: str,
                         voice: str = "alloy",
                         model: str = "tts-1",
                         output_path: str = "speech.mp3") -> str:
        """Convert text to speech
        
        Voices: alloy, echo, fable, onyx, nova, shimmer
        Models: tts-1 (faster), tts-1-hd (higher quality)
        """
        
        response = self.client.audio.speech.create(
            model=model,
            voice=voice,
            input=text
        )
        
        # Save to file
        response.stream_to_file(output_path)
        
        return output_path
    
    def synthesize_long_text(self,
                            text: str,
                            voice: str = "alloy",
                            chunk_size: int = 4000) -> List[str]:
        """Synthesize long text in chunks"""
        
        # Split into chunks
        chunks = [text[i:i+chunk_size] for i in range(0, len(text), chunk_size)]
        
        output_files = []
        for i, chunk in enumerate(chunks):
            output_path = f"speech_part_{i}.mp3"
            self.synthesize_speech(chunk, voice, output_path=output_path)
            output_files.append(output_path)
        
        return output_files

# Usage
tts = TextToSpeech()

# Synthesize
audio_file = tts.synthesize_speech(
    "Hello! I am an AI agent with voice capabilities.",
    voice="nova"
)
print(f"Generated audio: {audio_file}")

Document Parsing

PDF Processing

import PyPDF2
from typing import List, Dict

class DocumentAgent:
    """Process various document types"""
    
    def __init__(self):
        self.client = openai.OpenAI()
        self.vision_agent = VisionAgent()
    
    def extract_text_from_pdf(self, pdf_path: str) -> Dict:
        """Extract text from PDF"""
        
        with open(pdf_path, 'rb') as file:
            pdf_reader = PyPDF2.PdfReader(file)
            
            text_by_page = []
            for page_num, page in enumerate(pdf_reader.pages):
                text = page.extract_text()
                text_by_page.append({
                    "page": page_num + 1,
                    "text": text
                })
            
            full_text = "\n\n".join([p["text"] for p in text_by_page])
            
            return {
                "num_pages": len(pdf_reader.pages),
                "pages": text_by_page,
                "full_text": full_text
            }
    
    def analyze_pdf_with_vision(self, pdf_path: str) -> List[Dict]:
        """Analyze PDF pages as images"""
        
        # Convert PDF pages to images (requires pdf2image)
        from pdf2image import convert_from_path
        
        images = convert_from_path(pdf_path)
        
        analyses = []
        for i, image in enumerate(images):
            # Save temporarily
            temp_path = f"temp_page_{i}.jpg"
            image.save(temp_path, 'JPEG')
            
            # Analyze with vision
            analysis = self.vision_agent.analyze_image(temp_path)
            
            analyses.append({
                "page": i + 1,
                "analysis": analysis
            })
            
            # Clean up
            import os
            os.remove(temp_path)
        
        return analyses
    
    def extract_tables_from_pdf(self, pdf_path: str) -> List[Dict]:
        """Extract tables from PDF"""
        
        # Using tabula-py for table extraction
        import tabula
        
        tables = tabula.read_pdf(pdf_path, pages='all', multiple_tables=True)
        
        extracted = []
        for i, table in enumerate(tables):
            extracted.append({
                "table_num": i + 1,
                "data": table.to_dict('records'),
                "shape": table.shape
            })
        
        return extracted
    
    def summarize_document(self, text: str, max_length: int = 500) -> str:
        """Summarize document"""
        
        prompt = f"""Summarize this document in {max_length} words or less:

{text[:10000]}  # Limit input

Summary:"""
        
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.3
        )
        
        return response.choices[0].message.content
    
    def answer_document_question(self, text: str, question: str) -> str:
        """Answer question about document"""
        
        prompt = f"""Based on this document, answer the question:

Document:
{text[:8000]}

Question: {question}

Answer:"""
        
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.3
        )
        
        return response.choices[0].message.content

# Usage
doc_agent = DocumentAgent()

# Extract text
result = doc_agent.extract_text_from_pdf("document.pdf")
print(f"Pages: {result['num_pages']}")
print(f"First page: {result['pages'][0]['text'][:200]}...")

# Summarize
summary = doc_agent.summarize_document(result['full_text'])
print(f"Summary: {summary}")

# Answer question
answer = doc_agent.answer_document_question(
    result['full_text'],
    "What are the main conclusions?"
)
print(f"Answer: {answer}")

Multimodal Understanding

class MultimodalAgent:
    """Agent that reasons across modalities"""
    
    def __init__(self):
        self.client = openai.OpenAI()
        self.vision = VisionAgent()
        self.audio = AudioAgent()
        self.document = DocumentAgent()
    
    def analyze_multimodal_input(self, inputs: Dict) -> str:
        """Analyze multiple types of input together"""
        
        context = "Analyzing multimodal input:\n\n"
        
        # Process each modality
        if "image" in inputs:
            image_analysis = self.vision.analyze_image(inputs["image"])
            context += f"Image: {image_analysis}\n\n"
        
        if "audio" in inputs:
            audio_transcript = self.audio.transcribe_audio(inputs["audio"])
            context += f"Audio: {audio_transcript['text']}\n\n"
        
        if "text" in inputs:
            context += f"Text: {inputs['text']}\n\n"
        
        if "document" in inputs:
            doc_content = self.document.extract_text_from_pdf(inputs["document"])
            context += f"Document: {doc_content['full_text'][:1000]}...\n\n"
        
        # Synthesize understanding
        prompt = f"""{context}

Based on all this information, provide a comprehensive analysis:
1. Key themes across all modalities
2. How the different inputs relate to each other
3. Overall insights

Analysis:"""
        
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.5
        )
        
        return response.choices[0].message.content
    
    def generate_multimodal_response(self, 
                                    query: str,
                                    include_image: bool = False,
                                    include_audio: bool = False) -> Dict:
        """Generate response in multiple modalities"""
        
        # Generate text response
        text_response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": query}]
        ).choices[0].message.content
        
        result = {"text": text_response}
        
        # Generate image if requested
        if include_image:
            # Extract visual description from text
            image_prompt = self.extract_visual_description(text_response)
            generator = ImageGenerator()
            image_url = generator.generate_image(image_prompt)[0]
            result["image"] = image_url
        
        # Generate audio if requested
        if include_audio:
            tts = TextToSpeech()
            audio_file = tts.synthesize_speech(text_response)
            result["audio"] = audio_file
        
        return result
    
    def extract_visual_description(self, text: str) -> str:
        """Extract visual description for image generation"""
        
        prompt = f"""From this text, create a detailed visual description suitable for image generation:

{text}

Visual description:"""
        
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.5
        )
        
        return response.choices[0].message.content
    
    def create_presentation(self, topic: str, num_slides: int = 5) -> List[Dict]:
        """Create multimodal presentation"""
        
        # Generate outline
        outline_prompt = f"Create a {num_slides}-slide presentation outline about: {topic}"
        
        outline_response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": outline_prompt}]
        )
        
        outline = outline_response.choices[0].message.content
        
        # Generate each slide
        slides = []
        generator = ImageGenerator()
        tts = TextToSpeech()
        
        for i in range(num_slides):
            # Generate slide content
            slide_prompt = f"""Create content for slide {i+1} of presentation about {topic}.
            
Outline: {outline}

Provide:
1. Title
2. Key points (3-5 bullets)
3. Visual description for image

Slide content:"""
            
            slide_response = self.client.chat.completions.create(
                model="gpt-4",
                messages=[{"role": "user", "content": slide_prompt}]
            )
            
            slide_content = slide_response.choices[0].message.content
            
            # Generate image
            visual_desc = self.extract_visual_description(slide_content)
            image_url = generator.generate_image(visual_desc)[0]
            
            # Generate narration audio
            audio_file = tts.synthesize_speech(
                slide_content,
                output_path=f"slide_{i+1}_narration.mp3"
            )
            
            slides.append({
                "slide_num": i + 1,
                "content": slide_content,
                "image": image_url,
                "audio": audio_file
            })
        
        return slides

# Usage
multimodal_agent = MultimodalAgent()

# Analyze multimodal input
analysis = multimodal_agent.analyze_multimodal_input({
    "image": "chart.jpg",
    "text": "This shows our quarterly results",
    "audio": "explanation.mp3"
})
print(f"Analysis: {analysis}")

# Generate multimodal response
response = multimodal_agent.generate_multimodal_response(
    "Explain quantum computing",
    include_image=True,
    include_audio=True
)
print(f"Text: {response['text']}")
print(f"Image: {response['image']}")
print(f"Audio: {response['audio']}")

# Create presentation
slides = multimodal_agent.create_presentation("AI Agents", num_slides=3)
for slide in slides:
    print(f"Slide {slide['slide_num']}: {slide['content'][:100]}...")

Best Practices

Choose right modality: Use most appropriate for task
Quality control: Validate outputs across modalities
Accessibility: Provide alternatives (captions, transcripts)
Privacy: Handle sensitive data carefully
Cost management: Multimodal can be expensive
Caching: Reuse processed results
Error handling: Each modality can fail differently
User preferences: Let users choose modalities
Testing: Test across all modalities
Performance: Optimize processing pipelines

Next Steps

You now understand multimodal agents in depth! Next, we’ll explore agentic frameworks that help build complex agent systems.

Agentic Frameworks

Introduction to Agent Frameworks

Frameworks provide pre-built components, patterns, and tools for building agents faster and more reliably. They handle common challenges so you can focus on your specific use case.

Why Use Frameworks?

Benefits:

Faster development
Battle-tested patterns
Community support
Built-in best practices
Easier maintenance
Rich ecosystem

Trade-offs:

Learning curve
Framework lock-in
Less control
Overhead
Version dependencies

Popular Frameworks

LangChain: Comprehensive, modular
LangGraph: State machines for agents
AutoGPT: Autonomous agents
CrewAI: Multi-agent collaboration
AutoGen: Conversational agents

LangChain and LangGraph

LangChain Basics

from langchain.llms import OpenAI
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain.agents import Tool, AgentExecutor, create_react_agent
from langchain.memory import ConversationBufferMemory

class LangChainAgent:
    """Agent built with LangChain"""
    
    def __init__(self):
        self.llm = OpenAI(temperature=0.7)
        self.memory = ConversationBufferMemory()
        self.tools = self._create_tools()
        self.agent = self._create_agent()
    
    def _create_tools(self) -> List[Tool]:
        """Create agent tools"""
        
        def search_tool(query: str) -> str:
            """Search for information"""
            return f"Search results for: {query}"
        
        def calculator_tool(expression: str) -> str:
            """Calculate mathematical expression"""
            try:
                return str(eval(expression))
            except:
                return "Error in calculation"
        
        tools = [
            Tool(
                name="Search",
                func=search_tool,
                description="Search for information. Input should be a search query."
            ),
            Tool(
                name="Calculator",
                func=calculator_tool,
                description="Calculate mathematical expressions. Input should be a math expression."
            )
        ]
        
        return tools
    
    def _create_agent(self):
        """Create ReAct agent"""
        
        prompt = PromptTemplate.from_template("""
Answer the following question using available tools.

Tools:
{tools}

Question: {input}

{agent_scratchpad}
""")
        
        agent = create_react_agent(
            llm=self.llm,
            tools=self.tools,
            prompt=prompt
        )
        
        agent_executor = AgentExecutor(
            agent=agent,
            tools=self.tools,
            memory=self.memory,
            verbose=True,
            max_iterations=5
        )
        
        return agent_executor
    
    def run(self, query: str) -> str:
        """Run agent"""
        result = self.agent.invoke({"input": query})
        return result["output"]

# Usage
agent = LangChainAgent()
response = agent.run("What is 25 * 17?")
print(response)

LangChain Chains

from langchain.chains import SequentialChain, TransformChain
from langchain.chains.llm import LLMChain

class ChainedAgent:
    """Agent using LangChain chains"""
    
    def __init__(self):
        self.llm = OpenAI(temperature=0.5)
    
    def create_research_chain(self):
        """Create multi-step research chain"""
        
        # Step 1: Generate search queries
        query_prompt = PromptTemplate(
            input_variables=["topic"],
            template="Generate 3 search queries to research: {topic}\n\nQueries:"
        )
        query_chain = LLMChain(llm=self.llm, prompt=query_prompt, output_key="queries")
        
        # Step 2: Search (simplified)
        def search_transform(inputs: dict) -> dict:
            queries = inputs["queries"].split('\n')
            results = [f"Results for: {q}" for q in queries if q.strip()]
            return {"search_results": "\n".join(results)}
        
        search_chain = TransformChain(
            input_variables=["queries"],
            output_variables=["search_results"],
            transform=search_transform
        )
        
        # Step 3: Synthesize
        synthesis_prompt = PromptTemplate(
            input_variables=["topic", "search_results"],
            template="""Synthesize information about {topic} from these results:

{search_results}

Summary:"""
        )
        synthesis_chain = LLMChain(llm=self.llm, prompt=synthesis_prompt, output_key="summary")
        
        # Combine into sequential chain
        overall_chain = SequentialChain(
            chains=[query_chain, search_chain, synthesis_chain],
            input_variables=["topic"],
            output_variables=["summary"],
            verbose=True
        )
        
        return overall_chain
    
    def research(self, topic: str) -> str:
        """Conduct research using chain"""
        chain = self.create_research_chain()
        result = chain({"topic": topic})
        return result["summary"]

# Usage
chained_agent = ChainedAgent()
summary = chained_agent.research("AI agent architectures")
print(summary)

LangGraph State Machines

from langgraph.graph import StateGraph, END
from typing import TypedDict, Annotated
import operator

class AgentState(TypedDict):
    """State for agent"""
    messages: Annotated[list, operator.add]
    current_step: str
    data: dict

class LangGraphAgent:
    """Agent using LangGraph state machine"""
    
    def __init__(self):
        self.llm = OpenAI()
        self.graph = self._build_graph()
    
    def _build_graph(self):
        """Build state machine graph"""
        
        workflow = StateGraph(AgentState)
        
        # Define nodes (states)
        workflow.add_node("start", self.start_node)
        workflow.add_node("research", self.research_node)
        workflow.add_node("analyze", self.analyze_node)
        workflow.add_node("respond", self.respond_node)
        
        # Define edges (transitions)
        workflow.set_entry_point("start")
        workflow.add_edge("start", "research")
        workflow.add_edge("research", "analyze")
        workflow.add_edge("analyze", "respond")
        workflow.add_edge("respond", END)
        
        return workflow.compile()
    
    def start_node(self, state: AgentState) -> AgentState:
        """Initial state"""
        print("📍 Starting...")
        state["current_step"] = "start"
        return state
    
    def research_node(self, state: AgentState) -> AgentState:
        """Research state"""
        print("🔍 Researching...")
        
        # Simulate research
        query = state["messages"][-1] if state["messages"] else ""
        state["data"]["research_results"] = f"Research results for: {query}"
        state["current_step"] = "research"
        
        return state
    
    def analyze_node(self, state: AgentState) -> AgentState:
        """Analysis state"""
        print("📊 Analyzing...")
        
        results = state["data"].get("research_results", "")
        state["data"]["analysis"] = f"Analysis of: {results}"
        state["current_step"] = "analyze"
        
        return state
    
    def respond_node(self, state: AgentState) -> AgentState:
        """Response state"""
        print("💬 Responding...")
        
        analysis = state["data"].get("analysis", "")
        response = f"Based on analysis: {analysis}"
        state["messages"].append(response)
        state["current_step"] = "respond"
        
        return state
    
    def run(self, query: str) -> str:
        """Run agent through state machine"""
        
        initial_state = {
            "messages": [query],
            "current_step": "init",
            "data": {}
        }
        
        final_state = self.graph.invoke(initial_state)
        
        return final_state["messages"][-1]

# Usage
langgraph_agent = LangGraphAgent()
response = langgraph_agent.run("Explain quantum computing")
print(response)

AutoGPT and BabyAGI

AutoGPT Pattern

class AutoGPTAgent:
    """Autonomous agent inspired by AutoGPT"""
    
    def __init__(self, objective: str):
        self.objective = objective
        self.client = openai.OpenAI()
        self.task_list = []
        self.completed_tasks = []
        self.memory = []
    
    def run(self, max_iterations: int = 10):
        """Run autonomous agent"""
        
        print(f"🎯 Objective: {self.objective}\n")
        
        # Generate initial tasks
        self.task_list = self.generate_tasks(self.objective)
        
        for iteration in range(max_iterations):
            if not self.task_list:
                print("✅ All tasks completed!")
                break
            
            # Get next task
            current_task = self.task_list.pop(0)
            print(f"\n📋 Task {iteration + 1}: {current_task}")
            
            # Execute task
            result = self.execute_task(current_task)
            print(f"✓ Result: {result[:200]}...")
            
            # Store in memory
            self.memory.append({
                "task": current_task,
                "result": result
            })
            self.completed_tasks.append(current_task)
            
            # Generate new tasks based on result
            new_tasks = self.generate_new_tasks(current_task, result)
            self.task_list.extend(new_tasks)
            
            # Prioritize tasks
            self.task_list = self.prioritize_tasks(self.task_list)
        
        return self.summarize_results()
    
    def generate_tasks(self, objective: str) -> List[str]:
        """Generate initial task list"""
        
        prompt = f"""Given this objective: {objective}

Break it down into 3-5 specific, actionable tasks.
List them in order of execution.

Tasks:"""
        
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.5
        )
        
        tasks_text = response.choices[0].message.content
        tasks = [t.strip('0123456789.- ').strip() for t in tasks_text.split('\n') if t.strip()]
        
        return tasks
    
    def execute_task(self, task: str) -> str:
        """Execute a single task"""
        
        # Build context from memory
        context = self.build_context()
        
        prompt = f"""Objective: {self.objective}

Previous tasks completed:
{context}

Current task: {task}

Execute this task and provide the result:"""
        
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.7
        )
        
        return response.choices[0].message.content
    
    def generate_new_tasks(self, completed_task: str, result: str) -> List[str]:
        """Generate new tasks based on result"""
        
        prompt = f"""Objective: {self.objective}

Completed task: {completed_task}
Result: {result}

Based on this result, what new tasks (if any) should be added?
Only suggest tasks that help achieve the objective.

New tasks (or "none"):"""
        
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.5
        )
        
        tasks_text = response.choices[0].message.content
        
        if "none" in tasks_text.lower():
            return []
        
        tasks = [t.strip('0123456789.- ').strip() for t in tasks_text.split('\n') if t.strip()]
        return tasks
    
    def prioritize_tasks(self, tasks: List[str]) -> List[str]:
        """Prioritize task list"""
        
        if not tasks:
            return []
        
        prompt = f"""Objective: {self.objective}

Tasks to prioritize:
{chr(10).join([f"{i+1}. {t}" for i, t in enumerate(tasks)])}

Reorder these tasks by priority (most important first).
Return just the task list in order.

Prioritized tasks:"""
        
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.3
        )
        
        prioritized_text = response.choices[0].message.content
        prioritized = [t.strip('0123456789.- ').strip() for t in prioritized_text.split('\n') if t.strip()]
        
        return prioritized
    
    def build_context(self) -> str:
        """Build context from memory"""
        if not self.memory:
            return "None"
        
        context = []
        for item in self.memory[-5:]:  # Last 5 tasks
            context.append(f"- {item['task']}: {item['result'][:100]}...")
        
        return "\n".join(context)
    
    def summarize_results(self) -> str:
        """Summarize all results"""
        
        prompt = f"""Objective: {self.objective}

Completed tasks and results:
{chr(10).join([f"{i+1}. {m['task']}: {m['result']}" for i, m in enumerate(self.memory)])}

Provide a comprehensive summary of what was accomplished:"""
        
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.5
        )
        
        return response.choices[0].message.content

# Usage
autogpt = AutoGPTAgent("Research and summarize the top 3 AI agent frameworks")
summary = autogpt.run(max_iterations=5)
print(f"\n📝 Final Summary:\n{summary}")

CrewAI and AutoGen

Multi-Agent Collaboration

class Agent:
    """Individual agent in crew"""
    
    def __init__(self, role: str, goal: str, backstory: str):
        self.role = role
        self.goal = goal
        self.backstory = backstory
        self.client = openai.OpenAI()
    
    def execute_task(self, task: str, context: str = "") -> str:
        """Execute task as this agent"""
        
        prompt = f"""You are a {self.role}.

Your goal: {self.goal}

Background: {self.backstory}

{f"Context: {context}" if context else ""}

Task: {task}

Response:"""
        
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.7
        )
        
        return response.choices[0].message.content

class Crew:
    """Crew of collaborating agents"""
    
    def __init__(self):
        self.agents = []
        self.tasks = []
    
    def add_agent(self, agent: Agent):
        """Add agent to crew"""
        self.agents.append(agent)
        print(f"👤 Added agent: {agent.role}")
    
    def add_task(self, description: str, agent_role: str, dependencies: List[str] = None):
        """Add task to crew"""
        self.tasks.append({
            "description": description,
            "agent_role": agent_role,
            "dependencies": dependencies or [],
            "status": "pending",
            "result": None
        })
    
    def run(self) -> Dict:
        """Execute all tasks with crew"""
        
        print("\n🚀 Starting crew execution\n")
        
        completed = set()
        
        while len(completed) < len(self.tasks):
            # Find ready tasks
            ready_tasks = [
                task for task in self.tasks
                if task["status"] == "pending" and
                all(dep in completed for dep in task["dependencies"])
            ]
            
            if not ready_tasks:
                break
            
            # Execute ready tasks
            for task in ready_tasks:
                # Find agent
                agent = next((a for a in self.agents if a.role == task["agent_role"]), None)
                
                if not agent:
                    print(f"⚠️  No agent found for role: {task['agent_role']}")
                    task["status"] = "failed"
                    continue
                
                # Build context from dependencies
                context = self.build_context(task["dependencies"])
                
                # Execute
                print(f"▶️  {agent.role}: {task['description']}")
                result = agent.execute_task(task["description"], context)
                
                task["result"] = result
                task["status"] = "completed"
                completed.add(task["description"])
                
                print(f"✓ Completed\n")
        
        return self.generate_report()
    
    def build_context(self, dependencies: List[str]) -> str:
        """Build context from completed dependencies"""
        context_parts = []
        
        for dep in dependencies:
            dep_task = next((t for t in self.tasks if t["description"] == dep), None)
            if dep_task and dep_task["result"]:
                context_parts.append(f"{dep}: {dep_task['result'][:200]}...")
        
        return "\n\n".join(context_parts)
    
    def generate_report(self) -> Dict:
        """Generate execution report"""
        completed = sum(1 for t in self.tasks if t["status"] == "completed")
        
        return {
            "total_tasks": len(self.tasks),
            "completed": completed,
            "failed": len(self.tasks) - completed,
            "tasks": self.tasks
        }

# Usage
crew = Crew()

# Add agents
researcher = Agent(
    role="Researcher",
    goal="Find and analyze information",
    backstory="Expert researcher with deep analytical skills"
)

writer = Agent(
    role="Writer",
    goal="Create clear, engaging content",
    backstory="Professional writer skilled at explaining complex topics"
)

reviewer = Agent(
    role="Reviewer",
    goal="Ensure quality and accuracy",
    backstory="Detail-oriented reviewer with high standards"
)

crew.add_agent(researcher)
crew.add_agent(writer)
crew.add_agent(reviewer)

# Add tasks
crew.add_task(
    "Research the top 3 AI agent frameworks",
    "Researcher"
)

crew.add_task(
    "Write a comparison article based on the research",
    "Writer",
    dependencies=["Research the top 3 AI agent frameworks"]
)

crew.add_task(
    "Review the article for accuracy and clarity",
    "Reviewer",
    dependencies=["Write a comparison article based on the research"]
)

# Execute
report = crew.run()
print(f"\n📊 Report: {report['completed']}/{report['total_tasks']} tasks completed")

Custom Framework Design

Building Your Own Framework

class CustomAgentFramework:
    """Custom agent framework"""
    
    def __init__(self):
        self.agents = {}
        self.tools = {}
        self.memory = {}
        self.middleware = []
    
    def register_agent(self, name: str, agent_class):
        """Register agent type"""
        self.agents[name] = agent_class
        print(f"✅ Registered agent: {name}")
    
    def register_tool(self, name: str, tool_func):
        """Register tool"""
        self.tools[name] = tool_func
        print(f"🔧 Registered tool: {name}")
    
    def add_middleware(self, middleware_func):
        """Add middleware for request processing"""
        self.middleware.append(middleware_func)
    
    def create_agent(self, agent_type: str, **kwargs):
        """Create agent instance"""
        if agent_type not in self.agents:
            raise ValueError(f"Unknown agent type: {agent_type}")
        
        agent_class = self.agents[agent_type]
        agent = agent_class(framework=self, **kwargs)
        
        return agent
    
    def execute_tool(self, tool_name: str, **params):
        """Execute tool"""
        if tool_name not in self.tools:
            raise ValueError(f"Unknown tool: {tool_name}")
        
        return self.tools[tool_name](**params)
    
    def process_request(self, agent, request: str) -> str:
        """Process request through middleware"""
        
        # Apply middleware
        for middleware in self.middleware:
            request = middleware(request)
        
        # Execute agent
        response = agent.process(request)
        
        return response

# Usage
framework = CustomAgentFramework()

# Register components
framework.register_tool("search", lambda query: f"Results for: {query}")
framework.register_tool("calculate", lambda expr: str(eval(expr)))

# Add middleware
def logging_middleware(request):
    print(f"📝 Request: {request}")
    return request

framework.add_middleware(logging_middleware)

# Create and use agent
# agent = framework.create_agent("research_agent")
# response = framework.process_request(agent, "Find information about AI")

Best Practices

Choose right framework: Match to your needs
Start simple: Don’t over-engineer
Understand abstractions: Know what framework does
Customize carefully: Extend, don’t fight framework
Keep updated: Follow framework updates
Test thoroughly: Framework bugs affect you
Monitor performance: Track overhead
Document usage: Help team understand
Plan migration: Have exit strategy
Contribute back: Share improvements

Next Steps

Chapter 7 (Advanced Topics) is complete! You now have deep knowledge of agent learning, multimodal capabilities, and frameworks. This prepares you for enterprise-scale deployments in Module 8.

Architecture Patterns

Module 8: Learning Objectives

By the end of this module, you will:

✓ Design microservices and event-driven architectures
✓ Implement enterprise security and compliance
✓ Optimize costs through caching and model selection
✓ Scale agents to handle production workloads
✓ Deploy on Kubernetes and serverless platforms

Introduction to Enterprise Architecture

Enterprise-scale agent systems require robust, scalable, and maintainable architectures. This section covers proven patterns for production deployments.

Key Requirements

Scalability:

Handle increasing load
Horizontal scaling
Resource efficiency
Performance optimization

Reliability:

High availability (99.9%+)
Fault tolerance
Graceful degradation
Disaster recovery

Maintainability:

Clear separation of concerns
Easy updates and rollbacks
Monitoring and debugging
Documentation

Security:

Authentication and authorization
Data encryption
Audit logging
Compliance

Microservices for Agents

Agent Microservices Architecture

from fastapi import FastAPI, HTTPException, Depends
from pydantic import BaseModel
from typing import Optional, Dict, Any
import uvicorn

# Agent Service
class AgentService:
    """Core agent microservice"""
    
    def __init__(self):
        self.app = FastAPI(title="Agent Service")
        self.setup_routes()
    
    def setup_routes(self):
        """Setup API routes"""
        
        @self.app.post("/agent/process")
        async def process_request(request: AgentRequest):
            """Process agent request"""
            try:
                result = await self.process(request)
                return {"success": True, "result": result}
            except Exception as e:
                raise HTTPException(status_code=500, detail=str(e))
        
        @self.app.get("/agent/health")
        async def health_check():
            """Health check endpoint"""
            return {"status": "healthy", "service": "agent"}
    
    async def process(self, request: AgentRequest) -> Dict:
        """Process agent request"""
        # Agent logic here
        return {"response": "Processed"}
    
    def run(self, host: str = "0.0.0.0", port: int = 8000):
        """Run service"""
        uvicorn.run(self.app, host=host, port=port)

class AgentRequest(BaseModel):
    """Agent request model"""
    user_id: str
    input: str
    context: Optional[Dict[str, Any]] = None

# Tool Service
class ToolService:
    """Tool execution microservice"""
    
    def __init__(self):
        self.app = FastAPI(title="Tool Service")
        self.tools = {}
        self.setup_routes()
    
    def setup_routes(self):
        """Setup API routes"""
        
        @self.app.post("/tools/execute")
        async def execute_tool(request: ToolRequest):
            """Execute tool"""
            try:
                result = await self.execute(request)
                return {"success": True, "result": result}
            except Exception as e:
                raise HTTPException(status_code=500, detail=str(e))
        
        @self.app.get("/tools/list")
        async def list_tools():
            """List available tools"""
            return {"tools": list(self.tools.keys())}
    
    async def execute(self, request: ToolRequest) -> Any:
        """Execute tool"""
        if request.tool_name not in self.tools:
            raise ValueError(f"Unknown tool: {request.tool_name}")
        
        tool = self.tools[request.tool_name]
        return tool(**request.parameters)
    
    def register_tool(self, name: str, func):
        """Register tool"""
        self.tools[name] = func

class ToolRequest(BaseModel):
    """Tool request model"""
    tool_name: str
    parameters: Dict[str, Any]

# Memory Service
class MemoryService:
    """Memory management microservice"""
    
    def __init__(self):
        self.app = FastAPI(title="Memory Service")
        self.storage = {}
        self.setup_routes()
    
    def setup_routes(self):
        """Setup API routes"""
        
        @self.app.post("/memory/store")
        async def store_memory(request: MemoryRequest):
            """Store memory"""
            self.storage[request.key] = request.value
            return {"success": True}
        
        @self.app.get("/memory/retrieve/{key}")
        async def retrieve_memory(key: str):
            """Retrieve memory"""
            value = self.storage.get(key)
            if value is None:
                raise HTTPException(status_code=404, detail="Memory not found")
            return {"key": key, "value": value}
        
        @self.app.delete("/memory/delete/{key}")
        async def delete_memory(key: str):
            """Delete memory"""
            if key in self.storage:
                del self.storage[key]
            return {"success": True}

class MemoryRequest(BaseModel):
    """Memory request model"""
    key: str
    value: Any

# API Gateway
class APIGateway:
    """API Gateway for routing requests"""
    
    def __init__(self):
        self.app = FastAPI(title="API Gateway")
        self.services = {
            "agent": "http://localhost:8000",
            "tools": "http://localhost:8001",
            "memory": "http://localhost:8002"
        }
        self.setup_routes()
    
    def setup_routes(self):
        """Setup gateway routes"""
        
        @self.app.post("/api/chat")
        async def chat(request: ChatRequest):
            """Chat endpoint"""
            import httpx
            
            # Route to agent service
            async with httpx.AsyncClient() as client:
                response = await client.post(
                    f"{self.services['agent']}/agent/process",
                    json=request.dict()
                )
                return response.json()
        
        @self.app.get("/api/health")
        async def health():
            """Check health of all services"""
            import httpx
            
            health_status = {}
            async with httpx.AsyncClient() as client:
                for service, url in self.services.items():
                    try:
                        response = await client.get(f"{url}/health", timeout=5)
                        health_status[service] = "healthy"
                    except:
                        health_status[service] = "unhealthy"
            
            return {"services": health_status}

class ChatRequest(BaseModel):
    """Chat request model"""
    user_id: str
    message: str

# Usage
if __name__ == "__main__":
    # Start services on different ports
    agent_service = AgentService()
    # agent_service.run(port=8000)
    
    tool_service = ToolService()
    # tool_service.run(port=8001)
    
    memory_service = MemoryService()
    # memory_service.run(port=8002)
    
    gateway = APIGateway()
    # gateway.app.run(port=8080)

Service Communication

import httpx
from typing import Optional
import asyncio

class ServiceClient:
    """Client for inter-service communication"""
    
    def __init__(self, base_url: str, timeout: int = 30):
        self.base_url = base_url
        self.timeout = timeout
        self.client = httpx.AsyncClient(timeout=timeout)
    
    async def call_service(self, 
                          endpoint: str,
                          method: str = "POST",
                          data: Optional[Dict] = None) -> Dict:
        """Call another service"""
        
        url = f"{self.base_url}{endpoint}"
        
        try:
            if method == "POST":
                response = await self.client.post(url, json=data)
            elif method == "GET":
                response = await self.client.get(url)
            else:
                raise ValueError(f"Unsupported method: {method}")
            
            response.raise_for_status()
            return response.json()
            
        except httpx.HTTPError as e:
            return {"error": str(e)}
    
    async def close(self):
        """Close client"""
        await self.client.aclose()

# Circuit Breaker for service calls
class CircuitBreaker:
    """Circuit breaker for service resilience"""
    
    def __init__(self, failure_threshold: int = 5, timeout: int = 60):
        self.failure_threshold = failure_threshold
        self.timeout = timeout
        self.failures = 0
        self.last_failure_time = None
        self.state = "closed"  # closed, open, half-open
    
    async def call(self, func, *args, **kwargs):
        """Call function with circuit breaker"""
        
        if self.state == "open":
            if time.time() - self.last_failure_time > self.timeout:
                self.state = "half-open"
            else:
                raise Exception("Circuit breaker is OPEN")
        
        try:
            result = await func(*args, **kwargs)
            
            if self.state == "half-open":
                self.state = "closed"
                self.failures = 0
            
            return result
            
        except Exception as e:
            self.failures += 1
            self.last_failure_time = time.time()
            
            if self.failures >= self.failure_threshold:
                self.state = "open"
            
            raise e

# Service Registry
class ServiceRegistry:
    """Service discovery and registration"""
    
    def __init__(self):
        self.services = {}
    
    def register(self, service_name: str, url: str, metadata: Dict = None):
        """Register service"""
        self.services[service_name] = {
            "url": url,
            "metadata": metadata or {},
            "registered_at": time.time()
        }
        print(f"✅ Registered service: {service_name} at {url}")
    
    def discover(self, service_name: str) -> Optional[str]:
        """Discover service URL"""
        service = self.services.get(service_name)
        return service["url"] if service else None
    
    def list_services(self) -> Dict:
        """List all services"""
        return self.services

# Usage
registry = ServiceRegistry()
registry.register("agent-service", "http://localhost:8000")
registry.register("tool-service", "http://localhost:8001")

# Get service URL
agent_url = registry.discover("agent-service")

Event-Driven Architectures

Message Queue Integration

import json
from typing import Callable, Dict
import asyncio
from queue import Queue
import threading

class MessageBroker:
    """Simple message broker"""
    
    def __init__(self):
        self.queues = {}
        self.subscribers = {}
    
    def create_queue(self, queue_name: str):
        """Create message queue"""
        if queue_name not in self.queues:
            self.queues[queue_name] = Queue()
            self.subscribers[queue_name] = []
    
    def publish(self, queue_name: str, message: Dict):
        """Publish message to queue"""
        if queue_name not in self.queues:
            self.create_queue(queue_name)
        
        self.queues[queue_name].put(message)
        print(f"📤 Published to {queue_name}: {message}")
    
    def subscribe(self, queue_name: str, handler: Callable):
        """Subscribe to queue"""
        if queue_name not in self.queues:
            self.create_queue(queue_name)
        
        self.subscribers[queue_name].append(handler)
        print(f"📥 Subscribed to {queue_name}")
    
    def start_consumer(self, queue_name: str):
        """Start consuming messages"""
        
        def consume():
            while True:
                try:
                    message = self.queues[queue_name].get(timeout=1)
                    
                    # Call all subscribers
                    for handler in self.subscribers[queue_name]:
                        try:
                            handler(message)
                        except Exception as e:
                            print(f"❌ Handler error: {e}")
                    
                except:
                    continue
        
        thread = threading.Thread(target=consume, daemon=True)
        thread.start()

# Event-Driven Agent
class EventDrivenAgent:
    """Agent using event-driven architecture"""
    
    def __init__(self, broker: MessageBroker):
        self.broker = broker
        self.setup_subscriptions()
    
    def setup_subscriptions(self):
        """Setup event subscriptions"""
        self.broker.subscribe("user_request", self.handle_user_request)
        self.broker.subscribe("tool_result", self.handle_tool_result)
    
    def handle_user_request(self, message: Dict):
        """Handle user request event"""
        print(f"🤖 Processing request: {message}")
        
        # Process and publish result
        result = {"response": f"Processed: {message.get('input')}"}
        self.broker.publish("agent_response", result)
    
    def handle_tool_result(self, message: Dict):
        """Handle tool result event"""
        print(f"🔧 Tool result: {message}")

# Usage
broker = MessageBroker()
agent = EventDrivenAgent(broker)

# Start consumers
broker.start_consumer("user_request")
broker.start_consumer("tool_result")

# Publish event
broker.publish("user_request", {"user_id": "123", "input": "Hello"})

Kafka Integration

from kafka import KafkaProducer, KafkaConsumer
import json

class KafkaAgentSystem:
    """Agent system using Kafka"""
    
    def __init__(self, bootstrap_servers: str = "localhost:9092"):
        self.bootstrap_servers = bootstrap_servers
        self.producer = KafkaProducer(
            bootstrap_servers=bootstrap_servers,
            value_serializer=lambda v: json.dumps(v).encode('utf-8')
        )
    
    def publish_event(self, topic: str, event: Dict):
        """Publish event to Kafka"""
        self.producer.send(topic, event)
        self.producer.flush()
        print(f"📤 Published to {topic}")
    
    def create_consumer(self, topic: str, group_id: str):
        """Create Kafka consumer"""
        consumer = KafkaConsumer(
            topic,
            bootstrap_servers=self.bootstrap_servers,
            group_id=group_id,
            value_deserializer=lambda m: json.loads(m.decode('utf-8'))
        )
        return consumer
    
    def consume_events(self, topic: str, group_id: str, handler: Callable):
        """Consume events from Kafka"""
        consumer = self.create_consumer(topic, group_id)
        
        for message in consumer:
            try:
                handler(message.value)
            except Exception as e:
                print(f"❌ Error processing message: {e}")

# Usage
# kafka_system = KafkaAgentSystem()
# kafka_system.publish_event("agent-requests", {"user_id": "123", "input": "Hello"})

Serverless Deployments

AWS Lambda Agent

import json
import boto3
from typing import Dict, Any

class LambdaAgent:
    """Agent deployed as AWS Lambda"""
    
    def __init__(self):
        self.client = openai.OpenAI()
        self.dynamodb = boto3.resource('dynamodb')
        self.table = self.dynamodb.Table('agent-memory')
    
    def handler(self, event: Dict, context: Any) -> Dict:
        """Lambda handler function"""
        
        try:
            # Parse request
            body = json.loads(event.get('body', '{}'))
            user_id = body.get('user_id')
            input_text = body.get('input')
            
            # Get user memory
            memory = self.get_memory(user_id)
            
            # Process request
            response = self.process(input_text, memory)
            
            # Update memory
            self.update_memory(user_id, response)
            
            return {
                'statusCode': 200,
                'body': json.dumps({
                    'response': response
                })
            }
            
        except Exception as e:
            return {
                'statusCode': 500,
                'body': json.dumps({
                    'error': str(e)
                })
            }
    
    def process(self, input_text: str, memory: Dict) -> str:
        """Process request"""
        # Build context from memory
        context = memory.get('context', '')
        
        messages = [
            {"role": "system", "content": f"Context: {context}"},
            {"role": "user", "content": input_text}
        ]
        
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=messages
        )
        
        return response.choices[0].message.content
    
    def get_memory(self, user_id: str) -> Dict:
        """Get user memory from DynamoDB"""
        try:
            response = self.table.get_item(Key={'user_id': user_id})
            return response.get('Item', {})
        except:
            return {}
    
    def update_memory(self, user_id: str, response: str):
        """Update user memory"""
        try:
            self.table.put_item(
                Item={
                    'user_id': user_id,
                    'context': response,
                    'updated_at': int(time.time())
                }
            )
        except Exception as e:
            print(f"Error updating memory: {e}")

# Lambda function
def lambda_handler(event, context):
    """AWS Lambda entry point"""
    agent = LambdaAgent()
    return agent.handler(event, context)

Serverless Framework Configuration

# serverless.yml
service: agent-service

provider:
  name: aws
  runtime: python3.11
  region: us-east-1
  environment:
    OPENAI_API_KEY: ${env:OPENAI_API_KEY}
  iamRoleStatements:
    - Effect: Allow
      Action:
        - dynamodb:GetItem
        - dynamodb:PutItem
      Resource: "arn:aws:dynamodb:*:*:table/agent-memory"

functions:
  agent:
    handler: handler.lambda_handler
    events:
      - http:
          path: agent/process
          method: post
          cors: true
    timeout: 30
    memorySize: 512

resources:
  Resources:
    AgentMemoryTable:
      Type: AWS::DynamoDB::Table
      Properties:
        TableName: agent-memory
        AttributeDefinitions:
          - AttributeName: user_id
            AttributeType: S
        KeySchema:
          - AttributeName: user_id
            KeyType: HASH
        BillingMode: PAY_PER_REQUEST

Scaling Strategies

Horizontal Scaling

from multiprocessing import Pool, cpu_count
import concurrent.futures

class ScalableAgentPool:
    """Pool of agent workers for horizontal scaling"""
    
    def __init__(self, num_workers: int = None):
        self.num_workers = num_workers or cpu_count()
        self.pool = Pool(processes=self.num_workers)
        print(f"🔧 Created pool with {self.num_workers} workers")
    
    def process_batch(self, requests: List[Dict]) -> List[Dict]:
        """Process batch of requests in parallel"""
        results = self.pool.map(self.process_single, requests)
        return results
    
    def process_single(self, request: Dict) -> Dict:
        """Process single request"""
        # Agent processing logic
        return {"response": f"Processed: {request.get('input')}"}
    
    def close(self):
        """Close pool"""
        self.pool.close()
        self.pool.join()

# Async scaling
class AsyncAgentPool:
    """Async agent pool"""
    
    def __init__(self, max_workers: int = 10):
        self.executor = concurrent.futures.ThreadPoolExecutor(max_workers=max_workers)
    
    async def process_batch(self, requests: List[Dict]) -> List[Dict]:
        """Process batch asynchronously"""
        loop = asyncio.get_event_loop()
        
        tasks = [
            loop.run_in_executor(self.executor, self.process_single, req)
            for req in requests
        ]
        
        results = await asyncio.gather(*tasks)
        return results
    
    def process_single(self, request: Dict) -> Dict:
        """Process single request"""
        return {"response": f"Processed: {request.get('input')}"}

# Usage
pool = ScalableAgentPool(num_workers=4)

requests = [
    {"input": f"Request {i}"} for i in range(100)
]

results = pool.process_batch(requests)
print(f"Processed {len(results)} requests")

pool.close()

Load Balancing

from typing import List
import random

class LoadBalancer:
    """Load balancer for agent instances"""
    
    def __init__(self, strategy: str = "round_robin"):
        self.strategy = strategy
        self.instances = []
        self.current_index = 0
        self.instance_loads = {}
    
    def register_instance(self, instance_url: str):
        """Register agent instance"""
        self.instances.append(instance_url)
        self.instance_loads[instance_url] = 0
        print(f"✅ Registered instance: {instance_url}")
    
    def get_instance(self) -> str:
        """Get instance based on strategy"""
        
        if self.strategy == "round_robin":
            return self.round_robin()
        elif self.strategy == "least_connections":
            return self.least_connections()
        elif self.strategy == "random":
            return self.random_selection()
        else:
            return self.round_robin()
    
    def round_robin(self) -> str:
        """Round-robin selection"""
        if not self.instances:
            raise Exception("No instances available")
        
        instance = self.instances[self.current_index]
        self.current_index = (self.current_index + 1) % len(self.instances)
        return instance
    
    def least_connections(self) -> str:
        """Select instance with least connections"""
        if not self.instances:
            raise Exception("No instances available")
        
        return min(self.instance_loads, key=self.instance_loads.get)
    
    def random_selection(self) -> str:
        """Random selection"""
        if not self.instances:
            raise Exception("No instances available")
        
        return random.choice(self.instances)
    
    def record_request(self, instance_url: str):
        """Record request to instance"""
        self.instance_loads[instance_url] += 1
    
    def record_completion(self, instance_url: str):
        """Record request completion"""
        self.instance_loads[instance_url] -= 1

# Usage
lb = LoadBalancer(strategy="least_connections")
lb.register_instance("http://agent1:8000")
lb.register_instance("http://agent2:8000")
lb.register_instance("http://agent3:8000")

# Route request
instance = lb.get_instance()
print(f"Routing to: {instance}")

Container Orchestration

Docker Compose Setup

# docker-compose.yml
version: '3.8'

services:
  agent-service:
    build: ./agent-service
    ports:
      - "8000:8000"
    environment:
      - OPENAI_API_KEY=${OPENAI_API_KEY}
      - REDIS_URL=redis://redis:6379
    depends_on:
      - redis
      - postgres
    deploy:
      replicas: 3
      resources:
        limits:
          cpus: '1'
          memory: 1G
  
  tool-service:
    build: ./tool-service
    ports:
      - "8001:8001"
    environment:
      - REDIS_URL=redis://redis:6379
    depends_on:
      - redis
  
  memory-service:
    build: ./memory-service
    ports:
      - "8002:8002"
    environment:
      - POSTGRES_URL=postgresql://user:pass@postgres:5432/agentdb
    depends_on:
      - postgres
  
  redis:
    image: redis:7-alpine
    ports:
      - "6379:6379"
    volumes:
      - redis-data:/data
  
  postgres:
    image: postgres:15-alpine
    environment:
      - POSTGRES_USER=user
      - POSTGRES_PASSWORD=pass
      - POSTGRES_DB=agentdb
    ports:
      - "5432:5432"
    volumes:
      - postgres-data:/var/lib/postgresql/data
  
  nginx:
    image: nginx:alpine
    ports:
      - "80:80"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf
    depends_on:
      - agent-service

volumes:
  redis-data:
  postgres-data:

Kubernetes Deployment

# kubernetes/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: agent-service
spec:
  replicas: 3
  selector:
    matchLabels:
      app: agent-service
  template:
    metadata:
      labels:
        app: agent-service
    spec:
      containers:
      - name: agent
        image: agent-service:latest
        ports:
        - containerPort: 8000
        env:
        - name: OPENAI_API_KEY
          valueFrom:
            secretKeyRef:
              name: agent-secrets
              key: openai-api-key
        resources:
          requests:
            memory: "512Mi"
            cpu: "500m"
          limits:
            memory: "1Gi"
            cpu: "1000m"
        livenessProbe:
          httpGet:
            path: /health
            port: 8000
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /ready
            port: 8000
          initialDelaySeconds: 5
          periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
  name: agent-service
spec:
  selector:
    app: agent-service
  ports:
  - protocol: TCP
    port: 80
    targetPort: 8000
  type: LoadBalancer
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: agent-service-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: agent-service
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

Best Practices

Decouple services: Loose coupling, high cohesion
Stateless design: Store state externally
Idempotent operations: Safe to retry
Circuit breakers: Prevent cascading failures
Health checks: Monitor service health
Graceful shutdown: Clean resource cleanup
Configuration management: Externalize config
Service discovery: Dynamic service location
API versioning: Backward compatibility
Documentation: Clear API contracts

Next Steps

You now understand enterprise architecture patterns! Next, we’ll explore security and compliance for production agent systems.

Security & Compliance

Introduction to Agent Security

Security is critical for production agent systems. This section covers authentication, authorization, data protection, and compliance requirements.

Security Principles

Defense in Depth: Multiple layers of security Least Privilege: Minimum necessary access Zero Trust: Verify everything Encryption: Protect data at rest and in transit Audit Everything: Complete logging

Threat Model

Threats:

Unauthorized access
Data breaches
Prompt injection
Model manipulation
Resource exhaustion
Privacy violations

Authentication and Authorization

JWT-Based Authentication

import jwt
from datetime import datetime, timedelta
from fastapi import HTTPException, Security, Depends
from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials
from typing import Optional, Dict

class AuthManager:
    """JWT-based authentication"""
    
    def __init__(self, secret_key: str, algorithm: str = "HS256"):
        self.secret_key = secret_key
        self.algorithm = algorithm
        self.security = HTTPBearer()
    
    def create_token(self, 
                    user_id: str,
                    roles: List[str],
                    expires_in: int = 3600) -> str:
        """Create JWT token"""
        
        payload = {
            "user_id": user_id,
            "roles": roles,
            "exp": datetime.utcnow() + timedelta(seconds=expires_in),
            "iat": datetime.utcnow()
        }
        
        token = jwt.encode(payload, self.secret_key, algorithm=self.algorithm)
        return token
    
    def verify_token(self, token: str) -> Dict:
        """Verify and decode JWT token"""
        
        try:
            payload = jwt.decode(
                token,
                self.secret_key,
                algorithms=[self.algorithm]
            )
            return payload
        except jwt.ExpiredSignatureError:
            raise HTTPException(status_code=401, detail="Token expired")
        except jwt.InvalidTokenError:
            raise HTTPException(status_code=401, detail="Invalid token")
    
    async def get_current_user(self,
                              credentials: HTTPAuthorizationCredentials = Security(HTTPBearer())):
        """Get current user from token"""
        
        token = credentials.credentials
        payload = self.verify_token(token)
        
        return {
            "user_id": payload["user_id"],
            "roles": payload["roles"]
        }

# Role-Based Access Control
class RBACManager:
    """Role-based access control"""
    
    def __init__(self):
        self.permissions = {
            "admin": ["read", "write", "delete", "admin"],
            "user": ["read", "write"],
            "viewer": ["read"]
        }
    
    def has_permission(self, roles: List[str], required_permission: str) -> bool:
        """Check if roles have required permission"""
        
        for role in roles:
            if role in self.permissions:
                if required_permission in self.permissions[role]:
                    return True
        
        return False
    
    def require_permission(self, permission: str):
        """Decorator to require permission"""
        
        def decorator(func):
            async def wrapper(*args, **kwargs):
                # Get user from context
                user = kwargs.get('current_user')
                
                if not user:
                    raise HTTPException(status_code=401, detail="Not authenticated")
                
                if not self.has_permission(user['roles'], permission):
                    raise HTTPException(status_code=403, detail="Insufficient permissions")
                
                return await func(*args, **kwargs)
            
            return wrapper
        return decorator

# Secure Agent API
class SecureAgentAPI:
    """Agent API with authentication"""
    
    def __init__(self):
        self.app = FastAPI()
        self.auth = AuthManager(secret_key="your-secret-key")
        self.rbac = RBACManager()
        self.setup_routes()
    
    def setup_routes(self):
        """Setup secure routes"""
        
        @self.app.post("/auth/login")
        async def login(credentials: LoginRequest):
            """Login and get token"""
            # Verify credentials (simplified)
            if self.verify_credentials(credentials.username, credentials.password):
                token = self.auth.create_token(
                    user_id=credentials.username,
                    roles=["user"]
                )
                return {"token": token}
            else:
                raise HTTPException(status_code=401, detail="Invalid credentials")
        
        @self.app.post("/agent/process")
        async def process(
            request: AgentRequest,
            current_user: Dict = Depends(self.auth.get_current_user)
        ):
            """Process request (requires authentication)"""
            
            # Check permission
            if not self.rbac.has_permission(current_user['roles'], 'write'):
                raise HTTPException(status_code=403, detail="Insufficient permissions")
            
            # Process request
            result = await self.process_request(request, current_user)
            return {"result": result}
    
    def verify_credentials(self, username: str, password: str) -> bool:
        """Verify user credentials"""
        # In production, check against database with hashed passwords
        return True

class LoginRequest(BaseModel):
    username: str
    password: str

# Usage
api = SecureAgentAPI()

API Key Management

import secrets
import hashlib
from datetime import datetime

class APIKeyManager:
    """Manage API keys"""
    
    def __init__(self):
        self.keys = {}  # In production, use database
    
    def generate_key(self, user_id: str, name: str) -> str:
        """Generate new API key"""
        
        # Generate secure random key
        key = f"sk_{secrets.token_urlsafe(32)}"
        
        # Hash for storage
        key_hash = hashlib.sha256(key.encode()).hexdigest()
        
        # Store
        self.keys[key_hash] = {
            "user_id": user_id,
            "name": name,
            "created_at": datetime.utcnow(),
            "last_used": None,
            "usage_count": 0
        }
        
        return key
    
    def verify_key(self, key: str) -> Optional[Dict]:
        """Verify API key"""
        
        key_hash = hashlib.sha256(key.encode()).hexdigest()
        
        if key_hash in self.keys:
            # Update usage
            self.keys[key_hash]["last_used"] = datetime.utcnow()
            self.keys[key_hash]["usage_count"] += 1
            
            return self.keys[key_hash]
        
        return None
    
    def revoke_key(self, key: str):
        """Revoke API key"""
        
        key_hash = hashlib.sha256(key.encode()).hexdigest()
        
        if key_hash in self.keys:
            del self.keys[key_hash]
            return True
        
        return False

# API Key Authentication
from fastapi.security import APIKeyHeader

class APIKeyAuth:
    """API Key authentication"""
    
    def __init__(self, key_manager: APIKeyManager):
        self.key_manager = key_manager
        self.api_key_header = APIKeyHeader(name="X-API-Key")
    
    async def verify(self, api_key: str = Security(APIKeyHeader(name="X-API-Key"))):
        """Verify API key"""
        
        key_data = self.key_manager.verify_key(api_key)
        
        if not key_data:
            raise HTTPException(status_code=401, detail="Invalid API key")
        
        return key_data

# Usage
key_manager = APIKeyManager()
api_key = key_manager.generate_key("user123", "Production Key")
print(f"API Key: {api_key}")

Data Encryption

Encryption at Rest

from cryptography.fernet import Fernet
from cryptography.hazmat.primitives import hashes
from cryptography.hazmat.primitives.kdf.pbkdf2 import PBKDF2
import base64

class DataEncryption:
    """Encrypt sensitive data"""
    
    def __init__(self, password: str):
        self.key = self.derive_key(password)
        self.cipher = Fernet(self.key)
    
    def derive_key(self, password: str) -> bytes:
        """Derive encryption key from password"""
        
        kdf = PBKDF2(
            algorithm=hashes.SHA256(),
            length=32,
            salt=b'static_salt',  # In production, use random salt
            iterations=100000,
        )
        
        key = base64.urlsafe_b64encode(kdf.derive(password.encode()))
        return key
    
    def encrypt(self, data: str) -> str:
        """Encrypt data"""
        
        encrypted = self.cipher.encrypt(data.encode())
        return base64.urlsafe_b64encode(encrypted).decode()
    
    def decrypt(self, encrypted_data: str) -> str:
        """Decrypt data"""
        
        encrypted = base64.urlsafe_b64decode(encrypted_data.encode())
        decrypted = self.cipher.decrypt(encrypted)
        return decrypted.decode()

# Encrypted Storage
class EncryptedStorage:
    """Store data with encryption"""
    
    def __init__(self, encryption_key: str):
        self.encryption = DataEncryption(encryption_key)
        self.storage = {}
    
    def store(self, key: str, value: str):
        """Store encrypted data"""
        
        encrypted_value = self.encryption.encrypt(value)
        self.storage[key] = encrypted_value
    
    def retrieve(self, key: str) -> Optional[str]:
        """Retrieve and decrypt data"""
        
        encrypted_value = self.storage.get(key)
        
        if encrypted_value:
            return self.encryption.decrypt(encrypted_value)
        
        return None

# Usage
storage = EncryptedStorage("my-secret-password")
storage.store("api_key", "sk_1234567890")
retrieved = storage.retrieve("api_key")
print(f"Retrieved: {retrieved}")

Encryption in Transit (TLS/SSL)

import ssl
from fastapi import FastAPI
import uvicorn

class SecureServer:
    """HTTPS server with TLS"""
    
    def __init__(self):
        self.app = FastAPI()
        self.setup_routes()
    
    def setup_routes(self):
        """Setup routes"""
        
        @self.app.get("/")
        async def root():
            return {"message": "Secure server"}
    
    def run(self, 
            host: str = "0.0.0.0",
            port: int = 443,
            cert_file: str = "cert.pem",
            key_file: str = "key.pem"):
        """Run with TLS"""
        
        uvicorn.run(
            self.app,
            host=host,
            port=port,
            ssl_keyfile=key_file,
            ssl_certfile=cert_file,
            ssl_version=ssl.PROTOCOL_TLS,
            ssl_cert_reqs=ssl.CERT_REQUIRED
        )

# Generate self-signed certificate (for development only)
def generate_self_signed_cert():
    """Generate self-signed certificate"""
    from cryptography import x509
    from cryptography.x509.oid import NameOID
    from cryptography.hazmat.primitives import hashes
    from cryptography.hazmat.primitives.asymmetric import rsa
    
    # Generate private key
    private_key = rsa.generate_private_key(
        public_exponent=65537,
        key_size=2048
    )
    
    # Generate certificate
    subject = issuer = x509.Name([
        x509.NameAttribute(NameOID.COUNTRY_NAME, "US"),
        x509.NameAttribute(NameOID.ORGANIZATION_NAME, "Agent System"),
    ])
    
    cert = x509.CertificateBuilder().subject_name(
        subject
    ).issuer_name(
        issuer
    ).public_key(
        private_key.public_key()
    ).serial_number(
        x509.random_serial_number()
    ).not_valid_before(
        datetime.utcnow()
    ).not_valid_after(
        datetime.utcnow() + timedelta(days=365)
    ).sign(private_key, hashes.SHA256())
    
    return private_key, cert

Audit Logging

Comprehensive Audit System

import logging
from datetime import datetime
from typing import Optional
import json

class AuditLogger:
    """Audit logging system"""
    
    def __init__(self, log_file: str = "audit.log"):
        self.logger = logging.getLogger("audit")
        self.logger.setLevel(logging.INFO)
        
        # File handler
        handler = logging.FileHandler(log_file)
        handler.setFormatter(logging.Formatter('%(message)s'))
        self.logger.addHandler(handler)
    
    def log_event(self,
                  event_type: str,
                  user_id: str,
                  action: str,
                  resource: str,
                  result: str,
                  metadata: Optional[Dict] = None):
        """Log audit event"""
        
        event = {
            "timestamp": datetime.utcnow().isoformat(),
            "event_type": event_type,
            "user_id": user_id,
            "action": action,
            "resource": resource,
            "result": result,
            "metadata": metadata or {},
            "ip_address": self.get_client_ip()
        }
        
        self.logger.info(json.dumps(event))
    
    def log_access(self, user_id: str, resource: str, granted: bool):
        """Log access attempt"""
        
        self.log_event(
            event_type="access",
            user_id=user_id,
            action="access",
            resource=resource,
            result="granted" if granted else "denied"
        )
    
    def log_data_access(self, user_id: str, data_type: str, operation: str):
        """Log data access"""
        
        self.log_event(
            event_type="data_access",
            user_id=user_id,
            action=operation,
            resource=data_type,
            result="success"
        )
    
    def log_security_event(self, user_id: str, event: str, severity: str):
        """Log security event"""
        
        self.log_event(
            event_type="security",
            user_id=user_id,
            action=event,
            resource="system",
            result=severity,
            metadata={"severity": severity}
        )
    
    def get_client_ip(self) -> str:
        """Get client IP address"""
        # In production, extract from request
        return "0.0.0.0"

# Audit Middleware
class AuditMiddleware:
    """Middleware for automatic audit logging"""
    
    def __init__(self, audit_logger: AuditLogger):
        self.audit_logger = audit_logger
    
    async def __call__(self, request, call_next):
        """Process request with audit logging"""
        
        # Log request
        user_id = request.state.user_id if hasattr(request.state, 'user_id') else "anonymous"
        
        self.audit_logger.log_event(
            event_type="api_request",
            user_id=user_id,
            action=request.method,
            resource=request.url.path,
            result="started"
        )
        
        # Process request
        try:
            response = await call_next(request)
            
            # Log success
            self.audit_logger.log_event(
                event_type="api_request",
                user_id=user_id,
                action=request.method,
                resource=request.url.path,
                result="success",
                metadata={"status_code": response.status_code}
            )
            
            return response
            
        except Exception as e:
            # Log failure
            self.audit_logger.log_event(
                event_type="api_request",
                user_id=user_id,
                action=request.method,
                resource=request.url.path,
                result="error",
                metadata={"error": str(e)}
            )
            
            raise

# Usage
audit_logger = AuditLogger()
audit_logger.log_access("user123", "/agent/process", granted=True)
audit_logger.log_security_event("user456", "failed_login", "warning")

Regulatory Considerations

class GDPRCompliance:
    """GDPR compliance features"""
    
    def __init__(self):
        self.data_store = {}
        self.consent_records = {}
        self.audit_logger = AuditLogger()
    
    def collect_consent(self, user_id: str, purposes: List[str]) -> bool:
        """Collect user consent"""
        
        self.consent_records[user_id] = {
            "purposes": purposes,
            "timestamp": datetime.utcnow(),
            "version": "1.0"
        }
        
        self.audit_logger.log_event(
            event_type="consent",
            user_id=user_id,
            action="collect",
            resource="consent",
            result="success",
            metadata={"purposes": purposes}
        )
        
        return True
    
    def check_consent(self, user_id: str, purpose: str) -> bool:
        """Check if user has consented"""
        
        consent = self.consent_records.get(user_id)
        
        if not consent:
            return False
        
        return purpose in consent["purposes"]
    
    def export_user_data(self, user_id: str) -> Dict:
        """Export all user data (right to data portability)"""
        
        self.audit_logger.log_event(
            event_type="data_export",
            user_id=user_id,
            action="export",
            resource="user_data",
            result="success"
        )
        
        # Collect all user data
        user_data = {
            "user_id": user_id,
            "data": self.data_store.get(user_id, {}),
            "consent": self.consent_records.get(user_id, {}),
            "exported_at": datetime.utcnow().isoformat()
        }
        
        return user_data
    
    def delete_user_data(self, user_id: str) -> bool:
        """Delete all user data (right to be forgotten)"""
        
        self.audit_logger.log_event(
            event_type="data_deletion",
            user_id=user_id,
            action="delete",
            resource="user_data",
            result="success"
        )
        
        # Delete all user data
        if user_id in self.data_store:
            del self.data_store[user_id]
        
        if user_id in self.consent_records:
            del self.consent_records[user_id]
        
        return True
    
    def anonymize_data(self, user_id: str) -> bool:
        """Anonymize user data"""
        
        if user_id in self.data_store:
            # Replace with anonymized version
            self.data_store[f"anon_{hash(user_id)}"] = self.data_store[user_id]
            del self.data_store[user_id]
        
        return True

# Usage
gdpr = GDPRCompliance()

# Collect consent
gdpr.collect_consent("user123", ["analytics", "personalization"])

# Check consent
has_consent = gdpr.check_consent("user123", "analytics")

# Export data
user_data = gdpr.export_user_data("user123")

# Delete data
gdpr.delete_user_data("user123")

SOC 2 Compliance

class SOC2Compliance:
    """SOC 2 compliance controls"""
    
    def __init__(self):
        self.audit_logger = AuditLogger()
        self.access_controls = RBACManager()
    
    def implement_access_controls(self):
        """Implement access controls (Security)"""
        # Already implemented via RBAC
        pass
    
    def monitor_availability(self) -> Dict:
        """Monitor system availability (Availability)"""
        
        # Check service health
        health_status = {
            "agent_service": self.check_service_health("agent"),
            "tool_service": self.check_service_health("tools"),
            "memory_service": self.check_service_health("memory")
        }
        
        uptime = sum(1 for status in health_status.values() if status) / len(health_status)
        
        return {
            "uptime_percentage": uptime * 100,
            "services": health_status
        }
    
    def ensure_processing_integrity(self, data: Dict) -> bool:
        """Ensure processing integrity (Processing Integrity)"""
        
        # Validate data
        if not self.validate_data(data):
            return False
        
        # Log processing
        self.audit_logger.log_event(
            event_type="data_processing",
            user_id=data.get("user_id", "system"),
            action="process",
            resource="data",
            result="success"
        )
        
        return True
    
    def protect_confidentiality(self, data: str) -> str:
        """Protect data confidentiality (Confidentiality)"""
        
        encryption = DataEncryption("secret-key")
        return encryption.encrypt(data)
    
    def maintain_privacy(self, user_id: str) -> bool:
        """Maintain privacy (Privacy)"""
        
        # Implement privacy controls
        gdpr = GDPRCompliance()
        
        # Check consent
        has_consent = gdpr.check_consent(user_id, "data_processing")
        
        if not has_consent:
            return False
        
        return True
    
    def check_service_health(self, service: str) -> bool:
        """Check service health"""
        # In production, actually check service
        return True
    
    def validate_data(self, data: Dict) -> bool:
        """Validate data integrity"""
        # Implement validation logic
        return True

Best Practices

Authentication: Always authenticate users
Authorization: Implement least privilege
Encryption: Encrypt sensitive data
Audit logging: Log all security events
Input validation: Validate all inputs
Rate limiting: Prevent abuse
Security headers: Use proper HTTP headers
Regular updates: Keep dependencies updated
Security testing: Regular penetration testing
Incident response: Have a plan

Next Steps

You now understand security and compliance! Next, we’ll explore cost optimization strategies for production agent systems.

Cost Optimization

Introduction to Cost Management

Managing costs is critical for sustainable agent systems. This section covers strategies to optimize spending while maintaining performance.

Cost Drivers

API Costs:

LLM API calls (tokens)
Embedding generation
Image generation
Audio processing

Infrastructure:

Compute resources
Storage
Network bandwidth
Database operations

Third-Party Services:

Search APIs
Data providers
Monitoring tools

Token Usage Optimization

Token Counting and Budgeting

import tiktoken
from typing import Dict, List

class TokenOptimizer:
    """Optimize token usage"""
    
    def __init__(self, model: str = "gpt-4"):
        self.encoding = tiktoken.encoding_for_model(model)
        self.model = model
        self.token_costs = {
            "gpt-4": {"input": 0.03, "output": 0.06},  # per 1K tokens
            "gpt-4-turbo": {"input": 0.01, "output": 0.03},
            "gpt-3.5-turbo": {"input": 0.0005, "output": 0.0015}
        }
    
    def count_tokens(self, text: str) -> int:
        """Count tokens in text"""
        return len(self.encoding.encode(text))
    
    def estimate_cost(self, input_text: str, output_tokens: int) -> float:
        """Estimate API call cost"""
        input_tokens = self.count_tokens(input_text)
        
        costs = self.token_costs.get(self.model, self.token_costs["gpt-4"])
        
        input_cost = (input_tokens / 1000) * costs["input"]
        output_cost = (output_tokens / 1000) * costs["output"]
        
        return input_cost + output_cost
    
    def optimize_prompt(self, prompt: str, max_tokens: int) -> str:
        """Optimize prompt to fit token budget"""
        tokens = self.count_tokens(prompt)
        
        if tokens <= max_tokens:
            return prompt
        
        # Truncate to fit budget
        words = prompt.split()
        while tokens > max_tokens and words:
            words.pop()
            prompt = " ".join(words)
            tokens = self.count_tokens(prompt)
        
        return prompt
    
    def compress_context(self, messages: List[Dict], max_tokens: int) -> List[Dict]:
        """Compress conversation context"""
        total_tokens = sum(self.count_tokens(m["content"]) for m in messages)
        
        if total_tokens <= max_tokens:
            return messages
        
        # Keep system message and recent messages
        compressed = [messages[0]]  # System message
        
        # Add recent messages until budget
        for msg in reversed(messages[1:]):
            msg_tokens = self.count_tokens(msg["content"])
            if total_tokens - msg_tokens >= 0:
                compressed.insert(1, msg)
                total_tokens -= msg_tokens
            else:
                break
        
        return compressed

# Usage
optimizer = TokenOptimizer("gpt-4")

prompt = "This is a long prompt..." * 100
tokens = optimizer.count_tokens(prompt)
cost = optimizer.estimate_cost(prompt, 500)

print(f"Tokens: {tokens}, Estimated cost: ${cost:.4f}")

# Optimize
optimized = optimizer.optimize_prompt(prompt, max_tokens=1000)

Caching Strategies

from functools import lru_cache
import hashlib
import json
from typing import Optional

class ResponseCache:
    """Cache LLM responses"""
    
    def __init__(self, max_size: int = 1000):
        self.cache = {}
        self.max_size = max_size
        self.hits = 0
        self.misses = 0
    
    def get_cache_key(self, prompt: str, model: str, temperature: float) -> str:
        """Generate cache key"""
        key_data = f"{prompt}:{model}:{temperature}"
        return hashlib.md5(key_data.encode()).hexdigest()
    
    def get(self, prompt: str, model: str, temperature: float) -> Optional[str]:
        """Get cached response"""
        key = self.get_cache_key(prompt, model, temperature)
        
        if key in self.cache:
            self.hits += 1
            return self.cache[key]
        
        self.misses += 1
        return None
    
    def set(self, prompt: str, model: str, temperature: float, response: str):
        """Cache response"""
        key = self.get_cache_key(prompt, model, temperature)
        
        # Evict oldest if full
        if len(self.cache) >= self.max_size:
            oldest_key = next(iter(self.cache))
            del self.cache[oldest_key]
        
        self.cache[key] = response
    
    def get_stats(self) -> Dict:
        """Get cache statistics"""
        total = self.hits + self.misses
        hit_rate = self.hits / total if total > 0 else 0
        
        return {
            "hits": self.hits,
            "misses": self.misses,
            "hit_rate": hit_rate,
            "size": len(self.cache)
        }

# Cached Agent
class CachedAgent:
    """Agent with response caching"""
    
    def __init__(self):
        self.client = openai.OpenAI()
        self.cache = ResponseCache()
    
    def generate(self, prompt: str, model: str = "gpt-4", temperature: float = 0.7) -> str:
        """Generate with caching"""
        
        # Check cache
        cached = self.cache.get(prompt, model, temperature)
        if cached:
            print("✓ Cache hit")
            return cached
        
        # Generate
        print("✗ Cache miss - calling API")
        response = self.client.chat.completions.create(
            model=model,
            messages=[{"role": "user", "content": prompt}],
            temperature=temperature
        )
        
        result = response.choices[0].message.content
        
        # Cache result
        self.cache.set(prompt, model, temperature, result)
        
        return result

# Usage
agent = CachedAgent()

# First call - cache miss
response1 = agent.generate("What is AI?")

# Second call - cache hit
response2 = agent.generate("What is AI?")

# Stats
stats = agent.cache.get_stats()
print(f"Cache hit rate: {stats['hit_rate']:.1%}")

Model Selection

Cost-Performance Trade-offs

class ModelSelector:
    """Select optimal model based on requirements"""
    
    def __init__(self):
        self.models = {
            "gpt-4": {
                "cost_per_1k": 0.03,
                "quality": 10,
                "speed": 5
            },
            "gpt-4-turbo": {
                "cost_per_1k": 0.01,
                "quality": 9,
                "speed": 8
            },
            "gpt-3.5-turbo": {
                "cost_per_1k": 0.0005,
                "quality": 7,
                "speed": 10
            }
        }
    
    def select_model(self, 
                    priority: str = "balanced",
                    complexity: str = "medium") -> str:
        """Select best model"""
        
        if priority == "cost":
            return "gpt-3.5-turbo"
        elif priority == "quality":
            return "gpt-4"
        elif priority == "speed":
            return "gpt-3.5-turbo"
        else:  # balanced
            if complexity == "high":
                return "gpt-4-turbo"
            else:
                return "gpt-3.5-turbo"
    
    def estimate_monthly_cost(self, 
                             requests_per_day: int,
                             avg_tokens: int,
                             model: str) -> float:
        """Estimate monthly cost"""
        
        cost_per_1k = self.models[model]["cost_per_1k"]
        daily_cost = (requests_per_day * avg_tokens / 1000) * cost_per_1k
        monthly_cost = daily_cost * 30
        
        return monthly_cost

# Usage
selector = ModelSelector()

# Select for simple task
model = selector.select_model(priority="cost", complexity="low")
print(f"Selected: {model}")

# Estimate costs
monthly = selector.estimate_monthly_cost(
    requests_per_day=10000,
    avg_tokens=500,
    model="gpt-3.5-turbo"
)
print(f"Estimated monthly cost: ${monthly:.2f}")

Batch Processing

Batch API Usage

class BatchProcessor:
    """Process requests in batches"""
    
    def __init__(self, batch_size: int = 10):
        self.batch_size = batch_size
        self.client = openai.OpenAI()
    
    def process_batch(self, requests: List[str]) -> List[str]:
        """Process multiple requests efficiently"""
        
        results = []
        
        # Process in batches
        for i in range(0, len(requests), self.batch_size):
            batch = requests[i:i + self.batch_size]
            
            # Process batch
            batch_results = self.process_single_batch(batch)
            results.extend(batch_results)
        
        return results
    
    def process_single_batch(self, batch: List[str]) -> List[str]:
        """Process single batch"""
        
        # Combine into single prompt for efficiency
        combined_prompt = "Process these requests:\n\n"
        for i, req in enumerate(batch, 1):
            combined_prompt += f"{i}. {req}\n"
        
        response = self.client.chat.completions.create(
            model="gpt-3.5-turbo",
            messages=[{"role": "user", "content": combined_prompt}]
        )
        
        # Parse results
        result_text = response.choices[0].message.content
        results = result_text.split('\n')
        
        return results[:len(batch)]

# Usage
processor = BatchProcessor(batch_size=5)
requests = [f"Summarize topic {i}" for i in range(20)]
results = processor.process_batch(requests)

Resource Optimization

Compute Optimization

class ResourceOptimizer:
    """Optimize compute resources"""
    
    def __init__(self):
        self.metrics = {
            "cpu_usage": [],
            "memory_usage": [],
            "response_times": []
        }
    
    def monitor_resources(self):
        """Monitor resource usage"""
        import psutil
        
        cpu = psutil.cpu_percent(interval=1)
        memory = psutil.virtual_memory().percent
        
        self.metrics["cpu_usage"].append(cpu)
        self.metrics["memory_usage"].append(memory)
        
        return {"cpu": cpu, "memory": memory}
    
    def should_scale(self) -> Dict:
        """Determine if scaling is needed"""
        
        if not self.metrics["cpu_usage"]:
            return {"scale": False}
        
        avg_cpu = sum(self.metrics["cpu_usage"][-10:]) / min(10, len(self.metrics["cpu_usage"]))
        avg_memory = sum(self.metrics["memory_usage"][-10:]) / min(10, len(self.metrics["memory_usage"]))
        
        scale_up = avg_cpu > 80 or avg_memory > 80
        scale_down = avg_cpu < 20 and avg_memory < 20
        
        return {
            "scale": scale_up or scale_down,
            "direction": "up" if scale_up else "down",
            "cpu": avg_cpu,
            "memory": avg_memory
        }

# Usage
optimizer = ResourceOptimizer()
resources = optimizer.monitor_resources()
scaling = optimizer.should_scale()

if scaling["scale"]:
    print(f"Scale {scaling['direction']}: CPU={scaling['cpu']:.1f}%, Memory={scaling['memory']:.1f}%")

Cost Monitoring

Real-Time Cost Tracking

class CostMonitor:
    """Monitor and track costs"""
    
    def __init__(self, budget: float = 1000.0):
        self.budget = budget
        self.costs = []
        self.alerts = []
    
    def record_cost(self, amount: float, service: str, metadata: Dict = None):
        """Record cost"""
        
        cost_entry = {
            "amount": amount,
            "service": service,
            "timestamp": time.time(),
            "metadata": metadata or {}
        }
        
        self.costs.append(cost_entry)
        
        # Check budget
        total = self.get_total_cost()
        if total > self.budget * 0.8:
            self.add_alert("warning", f"80% of budget used: ${total:.2f}")
        
        if total > self.budget:
            self.add_alert("critical", f"Budget exceeded: ${total:.2f}")
    
    def get_total_cost(self) -> float:
        """Get total cost"""
        return sum(c["amount"] for c in self.costs)
    
    def get_cost_by_service(self) -> Dict:
        """Get costs grouped by service"""
        by_service = {}
        
        for cost in self.costs:
            service = cost["service"]
            by_service[service] = by_service.get(service, 0) + cost["amount"]
        
        return by_service
    
    def add_alert(self, level: str, message: str):
        """Add cost alert"""
        alert = {
            "level": level,
            "message": message,
            "timestamp": time.time()
        }
        
        self.alerts.append(alert)
        print(f"🚨 {level.upper()}: {message}")
    
    def get_report(self) -> Dict:
        """Generate cost report"""
        total = self.get_total_cost()
        by_service = self.get_cost_by_service()
        
        return {
            "total_cost": total,
            "budget": self.budget,
            "remaining": self.budget - total,
            "utilization": (total / self.budget) * 100,
            "by_service": by_service,
            "alerts": self.alerts
        }

# Usage
monitor = CostMonitor(budget=100.0)

# Record costs
monitor.record_cost(15.50, "openai", {"model": "gpt-4"})
monitor.record_cost(2.30, "pinecone", {"operation": "query"})

# Get report
report = monitor.get_report()
print(f"Total: ${report['total_cost']:.2f}")
print(f"Budget utilization: {report['utilization']:.1f}%")

Best Practices

Monitor costs: Track spending in real-time
Set budgets: Implement spending limits
Cache responses: Avoid redundant API calls
Optimize prompts: Minimize token usage
Choose right model: Balance cost and quality
Batch requests: Process multiple items together
Use cheaper models: For simple tasks
Implement rate limiting: Prevent runaway costs
Regular audits: Review and optimize
Alert on anomalies: Detect unusual spending

Next Steps

Chapter 8 (Enterprise & Scale) is complete! You now understand architecture patterns, security & compliance, and cost optimization for production agent systems.

We’ve completed 8 out of 10 modules! Only Chapters 9 and 10 remain. Would you like to continue?

Frontier Capabilities

Module 9: Learning Objectives

By the end of this module, you will:

✓ Understand self-improving and meta-learning agents
✓ Explore constitutional AI and debate systems
✓ Recognize open problems in alignment and interpretability
✓ Identify frontier research directions
✓ Contribute to cutting-edge agent research

Introduction to Frontier Research

Frontier capabilities represent the cutting edge of agent research—capabilities that are emerging but not yet fully realized. This section explores what’s possible and what’s coming next.

What Makes Capabilities “Frontier”?

Characteristics:

Recently demonstrated in research
Not yet widely deployed
Significant technical challenges
High potential impact
Active research area

Categories:

Self-improvement and meta-learning
Tool creation and modification
Abstract reasoning
Long-horizon planning
Multi-agent emergence

Self-Improvement and Meta-Learning

Self-Modifying Agents

from typing import Dict, List, Callable
import ast

class SelfImprovingAgent:
    """Agent that can modify its own code"""
    
    def __init__(self):
        self.client = openai.OpenAI()
        self.code_history = []
        self.performance_history = []
    
    def analyze_performance(self, task_results: List[Dict]) -> Dict:
        """Analyze agent's performance"""
        
        success_rate = sum(1 for r in task_results if r["success"]) / len(task_results)
        avg_time = sum(r["time"] for r in task_results) / len(task_results)
        
        return {
            "success_rate": success_rate,
            "avg_time": avg_time,
            "total_tasks": len(task_results)
        }
    
    def identify_weaknesses(self, performance: Dict) -> List[str]:
        """Identify areas for improvement"""
        
        weaknesses = []
        
        if performance["success_rate"] < 0.8:
            weaknesses.append("low_success_rate")
        
        if performance["avg_time"] > 10:
            weaknesses.append("slow_execution")
        
        return weaknesses
    
    def generate_improvement(self, current_code: str, weaknesses: List[str]) -> str:
        """Generate improved version of code"""
        
        prompt = f"""Improve this agent code to address these weaknesses: {weaknesses}

Current code:
```python
{current_code}

Provide improved code that:

Maintains all functionality
Addresses identified weaknesses
Includes comments explaining changes

Improved code:“”“

    response = self.client.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": prompt}],
        temperature=0.3
    )
    
    return self.extract_code(response.choices[0].message.content)

def validate_improvement(self, new_code: str) -> bool:
    """Validate improved code"""
    
    try:
        # Parse to check syntax
        ast.parse(new_code)
        
        # Run safety checks
        if self.contains_unsafe_operations(new_code):
            return False
        
        return True
        
    except SyntaxError:
        return False

def contains_unsafe_operations(self, code: str) -> bool:
    """Check for unsafe operations"""
    
    unsafe_patterns = [
        "exec(", "eval(", "__import__",
        "os.system", "subprocess"
    ]
    
    return any(pattern in code for pattern in unsafe_patterns)

def self_improve(self, task_results: List[Dict]) -> Dict:
    """Self-improvement cycle"""
    
    # Analyze performance
    performance = self.analyze_performance(task_results)
    self.performance_history.append(performance)
    
    # Identify weaknesses
    weaknesses = self.identify_weaknesses(performance)
    
    if not weaknesses:
        return {"improved": False, "reason": "No weaknesses found"}
    
    # Get current code
    current_code = self.get_current_code()
    
    # Generate improvement
    improved_code = self.generate_improvement(current_code, weaknesses)
    
    # Validate
    if not self.validate_improvement(improved_code):
        return {"improved": False, "reason": "Validation failed"}
    
    # Store
    self.code_history.append({
        "code": improved_code,
        "weaknesses_addressed": weaknesses,
        "timestamp": time.time()
    })
    
    return {
        "improved": True,
        "weaknesses_addressed": weaknesses,
        "version": len(self.code_history)
    }

def get_current_code(self) -> str:
    """Get current agent code"""
    # In practice, would read actual code
    return "def process(input): return input"

def extract_code(self, text: str) -> str:
    """Extract code from response"""
    import re
    pattern = r'```python\n(.*?)```'
    matches = re.findall(pattern, text, re.DOTALL)
    return matches[0] if matches else text

Usage

agent = SelfImprovingAgent()

Simulate task results

results = [ {“success”: True, “time”: 5.2}, {“success”: False, “time”: 12.1}, {“success”: True, “time”: 6.8} ]

Self-improve

improvement = agent.self_improve(results) print(f“Improved: {improvement}“)


### Recursive Self-Improvement

```python
class RecursiveSelfImprovement:
    """Agent that recursively improves itself"""
    
    def __init__(self, max_iterations: int = 5):
        self.max_iterations = max_iterations
        self.client = openai.OpenAI()
        self.versions = []
    
    def improve_recursively(self, initial_code: str, test_suite: List[Dict]) -> Dict:
        """Recursively improve code"""
        
        current_code = initial_code
        current_score = self.evaluate_code(current_code, test_suite)
        
        print(f"Initial score: {current_score:.2f}")
        
        for iteration in range(self.max_iterations):
            print(f"\nIteration {iteration + 1}:")
            
            # Generate improvement
            improved_code = self.generate_improvement(current_code, current_score)
            
            # Evaluate
            new_score = self.evaluate_code(improved_code, test_suite)
            print(f"New score: {new_score:.2f}")
            
            # Check if improved
            if new_score > current_score:
                print("✓ Improvement accepted")
                current_code = improved_code
                current_score = new_score
                
                self.versions.append({
                    "iteration": iteration + 1,
                    "code": current_code,
                    "score": current_score
                })
            else:
                print("✗ No improvement, stopping")
                break
        
        return {
            "final_code": current_code,
            "final_score": current_score,
            "iterations": len(self.versions),
            "improvement": current_score - self.evaluate_code(initial_code, test_suite)
        }
    
    def evaluate_code(self, code: str, test_suite: List[Dict]) -> float:
        """Evaluate code quality"""
        
        # Run tests
        passed = 0
        for test in test_suite:
            try:
                # Execute code with test input
                result = self.execute_code(code, test["input"])
                if result == test["expected"]:
                    passed += 1
            except:
                pass
        
        return passed / len(test_suite) if test_suite else 0
    
    def generate_improvement(self, code: str, current_score: float) -> str:
        """Generate improved version"""
        
        prompt = f"Improve this code (current score: {current_score:.2f}):\n\n{code}\n\nMake it more efficient, readable, and robust.\n\nImproved code:"
        
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.4
        )
        
        return self.extract_code(response.choices[0].message.content)
    
    def execute_code(self, code: str, input_data: any) -> any:
        """Execute code safely"""
        # Simplified execution
        return input_data

# Usage
rsi = RecursiveSelfImprovement(max_iterations=3)

initial_code = """
def process(data):
    result = []
    for item in data:
        result.append(item * 2)
    return result
"""

test_suite = [
    {"input": [1, 2, 3], "expected": [2, 4, 6]},
    {"input": [0], "expected": [0]},
]

result = rsi.improve_recursively(initial_code, test_suite)
print(f"\nFinal improvement: {result['improvement']:.2f}")

Tool Creation and Modification

Dynamic Tool Generation

class ToolCreator:
    """Agent that creates new tools"""
    
    def __init__(self):
        self.client = openai.OpenAI()
        self.created_tools = {}
    
    def create_tool(self, description: str, examples: List[Dict]) -> Dict:
        """Create new tool from description"""
        
        # Generate tool code
        code = self.generate_tool_code(description, examples)
        
        # Generate tool schema
        schema = self.generate_tool_schema(description, code)
        
        # Validate
        if not self.validate_tool(code):
            return {"success": False, "error": "Validation failed"}
        
        # Register tool
        tool_name = self.extract_tool_name(code)
        self.created_tools[tool_name] = {
            "code": code,
            "schema": schema,
            "description": description
        }
        
        return {
            "success": True,
            "tool_name": tool_name,
            "schema": schema
        }
    
    def generate_tool_code(self, description: str, examples: List[Dict]) -> str:
        """Generate tool implementation"""
        
        examples_str = "\n".join([
            f"Input: {ex['input']}\nOutput: {ex['output']}"
            for ex in examples
        ])
        
        prompt = f"""Create a Python function for this tool:

Description: {description}

Examples:
{examples_str}

Requirements:
1. Function should be self-contained
2. Include type hints
3. Add docstring
4. Handle errors gracefully
5. Return results in consistent format

Code:"""
        
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.2
        )
        
        return self.extract_code(response.choices[0].message.content)
    
    def generate_tool_schema(self, description: str, code: str) -> Dict:
        """Generate tool schema"""
        
        prompt = f"""Generate a JSON schema for this tool:

Description: {description}

Code:
```python
{code}

Provide schema in OpenAI function calling format: {{ “name”: “tool_name”, “description”: “…”, “parameters”: {{ “type”: “object”, “properties”: {{…}}, “required”: […] }} }}

Schema:“”“

    response = self.client.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": prompt}],
        temperature=0.1
    )
    
    import json
    return json.loads(response.choices[0].message.content)

def validate_tool(self, code: str) -> bool:
    """Validate tool code"""
    try:
        ast.parse(code)
        return True
    except:
        return False

def extract_tool_name(self, code: str) -> str:
    """Extract function name from code"""
    tree = ast.parse(code)
    for node in ast.walk(tree):
        if isinstance(node, ast.FunctionDef):
            return node.name
    return "unknown_tool"

def modify_tool(self, tool_name: str, modification: str) -> Dict:
    """Modify existing tool"""
    
    if tool_name not in self.created_tools:
        return {"success": False, "error": "Tool not found"}
    
    current_code = self.created_tools[tool_name]["code"]
    
    prompt = f"""Modify this tool:

Current code:

{current_code}

Modification: {modification}

Provide modified code:“”“

    response = self.client.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": prompt}],
        temperature=0.2
    )
    
    modified_code = self.extract_code(response.choices[0].message.content)
    
    # Update tool
    self.created_tools[tool_name]["code"] = modified_code
    
    return {"success": True, "modified_code": modified_code}

Usage

creator = ToolCreator()

Create new tool

result = creator.create_tool( “Calculate compound interest”, examples=[ {“input”: {“principal”: 1000, “rate”: 0.05, “years”: 3}, “output”: 1157.63}, {“input”: {“principal”: 5000, “rate”: 0.03, “years”: 5}, “output”: 5796.37} ] )

print(f“Created tool: {result[‘tool_name’]}“)


## Abstract Reasoning

### Analogical Reasoning

```python
class AnalogicalReasoner:
    """Agent that reasons by analogy"""
    
    def __init__(self):
        self.client = openai.OpenAI()
        self.knowledge_base = []
    
    def find_analogies(self, problem: str, domain: str = None) -> List[Dict]:
        """Find analogous problems"""
        
        prompt = f"""Find analogies for this problem:

Problem: {problem}
{f"Domain: {domain}" if domain else ""}

Provide 3 analogous situations from different domains that share similar structure.

For each analogy:
1. Describe the analogous situation
2. Explain the structural similarity
3. Suggest how insights transfer

Analogies:"""
        
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.7
        )
        
        return self.parse_analogies(response.choices[0].message.content)
    
    def solve_by_analogy(self, problem: str) -> Dict:
        """Solve problem using analogical reasoning"""
        
        # Find analogies
        analogies = self.find_analogies(problem)
        
        # Extract solutions from analogies
        solutions = []
        for analogy in analogies:
            solution = self.extract_solution(problem, analogy)
            solutions.append(solution)
        
        # Synthesize final solution
        final_solution = self.synthesize_solutions(problem, solutions)
        
        return {
            "problem": problem,
            "analogies": analogies,
            "solutions": solutions,
            "final_solution": final_solution
        }
    
    def extract_solution(self, problem: str, analogy: Dict) -> str:
        """Extract solution approach from analogy"""
        
        prompt = f"""Given this analogy, how would you solve the original problem?

Original problem: {problem}

Analogy: {analogy}

Solution approach:"""
        
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.5
        )
        
        return response.choices[0].message.content
    
    def synthesize_solutions(self, problem: str, solutions: List[str]) -> str:
        """Synthesize multiple solution approaches"""
        
        solutions_text = "\n\n".join([f"Approach {i+1}:\n{s}" for i, s in enumerate(solutions)])
        
        prompt = f"""Synthesize these solution approaches into one optimal solution:

Problem: {problem}

{solutions_text}

Optimal solution:"""
        
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.4
        )
        
        return response.choices[0].message.content
    
    def parse_analogies(self, text: str) -> List[Dict]:
        """Parse analogies from text"""
        # Simplified parsing
        return [{"analogy": text}]

# Usage
reasoner = AnalogicalReasoner()

problem = "How to scale a software system to handle 10x more users?"
result = reasoner.solve_by_analogy(problem)

print(f"Solution: {result['final_solution']}")

Causal Reasoning

class CausalReasoner:
    """Agent that performs causal reasoning"""
    
    def __init__(self):
        self.client = openai.OpenAI()
    
    def identify_causal_relationships(self, observations: List[str]) -> Dict:
        """Identify causal relationships"""
        
        obs_text = "\n".join([f"- {obs}" for obs in observations])
        
        prompt = f"""Identify causal relationships in these observations:

{obs_text}

For each relationship:
1. Cause
2. Effect
3. Confidence (low/medium/high)
4. Explanation

Causal relationships:"""
        
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.3
        )
        
        return self.parse_causal_relationships(response.choices[0].message.content)
    
    def predict_intervention_effect(self, 
                                   current_state: str,
                                   intervention: str) -> str:
        """Predict effect of intervention"""
        
        prompt = f"""Predict the causal effect of this intervention:

Current state: {current_state}

Intervention: {intervention}

Analyze:
1. Direct effects
2. Indirect effects
3. Potential unintended consequences
4. Confidence in prediction

Prediction:"""
        
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.4
        )
        
        return response.choices[0].message.content
    
    def explain_outcome(self, outcome: str, context: str) -> str:
        """Explain why outcome occurred"""
        
        prompt = f"""Explain the causal chain that led to this outcome:

Context: {context}

Outcome: {outcome}

Provide:
1. Root causes
2. Contributing factors
3. Causal chain
4. Alternative explanations

Explanation:"""
        
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.4
        )
        
        return response.choices[0].message.content
    
    def parse_causal_relationships(self, text: str) -> Dict:
        """Parse causal relationships"""
        return {"relationships": text}

# Usage
causal = CausalReasoner()

observations = [
    "Website traffic increased by 50%",
    "New marketing campaign launched last week",
    "Server response time increased",
    "User complaints about slow loading"
]

relationships = causal.identify_causal_relationships(observations)
print(f"Causal relationships: {relationships}")

Long-Horizon Planning

Hierarchical Planning

class LongHorizonPlanner:
    """Agent for long-horizon planning"""
    
    def __init__(self):
        self.client = openai.OpenAI()
    
    def create_long_term_plan(self, 
                             goal: str,
                             horizon: str = "1 year",
                             constraints: List[str] = None) -> Dict:
        """Create long-term hierarchical plan"""
        
        constraints_text = "\n".join(constraints) if constraints else "None"
        
        prompt = f"""Create a detailed long-term plan:

Goal: {goal}
Time horizon: {horizon}
Constraints: {constraints_text}

Create a hierarchical plan with:
1. High-level milestones (quarterly)
2. Medium-level objectives (monthly)
3. Low-level tasks (weekly)

For each level:
- Clear deliverables
- Success criteria
- Dependencies
- Risk factors

Plan:"""
        
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.5
        )
        
        return self.parse_plan(response.choices[0].message.content)
    
    def adapt_plan(self, 
                   current_plan: Dict,
                   new_information: str) -> Dict:
        """Adapt plan based on new information"""
        
        prompt = f"""Adapt this plan based on new information:

Current plan: {current_plan}

New information: {new_information}

Provide:
1. What needs to change
2. Updated plan
3. Rationale for changes
4. New risks

Adapted plan:"""
        
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.4
        )
        
        return self.parse_plan(response.choices[0].message.content)
    
    def evaluate_progress(self, 
                         plan: Dict,
                         completed_tasks: List[str]) -> Dict:
        """Evaluate progress toward goal"""
        
        prompt = f"""Evaluate progress on this plan:

Plan: {plan}

Completed tasks: {completed_tasks}

Provide:
1. Completion percentage
2. On track / behind / ahead
3. Blockers
4. Recommendations

Evaluation:"""
        
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.3
        )
        
        return self.parse_evaluation(response.choices[0].message.content)
    
    def parse_plan(self, text: str) -> Dict:
        """Parse plan from text"""
        return {"plan": text}
    
    def parse_evaluation(self, text: str) -> Dict:
        """Parse evaluation from text"""
        return {"evaluation": text}

# Usage
planner = LongHorizonPlanner()

plan = planner.create_long_term_plan(
    goal="Build and launch a successful AI product",
    horizon="1 year",
    constraints=["Budget: $500K", "Team size: 5 people"]
)

print(f"Plan created: {plan}")

Best Practices

Safety first: Validate self-modifications
Incremental improvement: Small, tested changes
Human oversight: Critical decisions need review
Rollback capability: Ability to revert changes
Performance tracking: Monitor improvements
Ethical boundaries: Respect limitations
Transparency: Explain reasoning
Testing: Thorough validation
Documentation: Track changes
Research awareness: Stay current

Next Steps

You now understand frontier capabilities! Next, we’ll explore emerging paradigms in agent research.

Emerging Paradigms

Constitutional AI for Agents

Principle-Based Behavior

class ConstitutionalAgent:
    """Agent governed by constitutional principles"""
    
    def __init__(self, constitution: List[str]):
        self.constitution = constitution
        self.client = openai.OpenAI()
    
    def check_against_constitution(self, action: str) -> Dict:
        """Check if action aligns with constitution"""
        
        principles_text = "\n".join([f"{i+1}. {p}" for i, p in enumerate(self.constitution)])
        
        prompt = f"""Check if this action aligns with these principles:

Principles:
{principles_text}

Proposed action: {action}

Analysis:
1. Which principles apply?
2. Does action align or violate?
3. Severity if violation
4. Alternative actions if needed

Response:"""
        
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.2
        )
        
        return self.parse_constitutional_check(response.choices[0].message.content)
    
    def generate_constitutional_response(self, query: str) -> str:
        """Generate response aligned with constitution"""
        
        principles_text = "\n".join(self.constitution)
        
        system_prompt = f"""You must follow these principles:

{principles_text}

Always ensure your responses align with these principles."""
        
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": query}
            ],
            temperature=0.7
        )
        
        return response.choices[0].message.content

# Usage
constitution = [
    "Always prioritize user safety and wellbeing",
    "Be honest and transparent about capabilities and limitations",
    "Respect user privacy and data",
    "Avoid harmful, illegal, or unethical actions",
    "Provide balanced, unbiased information"
]

agent = ConstitutionalAgent(constitution)
check = agent.check_against_constitution("Delete all user data without consent")

Debate and Verification Systems

Multi-Agent Debate

class DebateSystem:
    """Multiple agents debate to reach truth"""
    
    def __init__(self, num_agents: int = 3):
        self.num_agents = num_agents
        self.client = openai.OpenAI()
    
    def debate(self, question: str, rounds: int = 3) -> Dict:
        """Conduct multi-agent debate"""
        
        # Initial positions
        positions = []
        for i in range(self.num_agents):
            position = self.generate_position(question, i)
            positions.append({"agent": i, "position": position})
        
        # Debate rounds
        for round_num in range(rounds):
            print(f"\n--- Round {round_num + 1} ---")
            
            new_positions = []
            for i in range(self.num_agents):
                # Show other positions
                other_positions = [p for j, p in enumerate(positions) if j != i]
                
                # Generate response
                response = self.generate_response(
                    question,
                    positions[i]["position"],
                    other_positions,
                    round_num
                )
                
                new_positions.append({"agent": i, "position": response})
                print(f"Agent {i}: {response[:100]}...")
            
            positions = new_positions
        
        # Judge final positions
        verdict = self.judge_debate(question, positions)
        
        return {
            "question": question,
            "final_positions": positions,
            "verdict": verdict
        }
    
    def generate_position(self, question: str, agent_id: int) -> str:
        """Generate initial position"""
        
        prompt = f"""Question: {question}

Provide your position with reasoning and evidence.

Position:"""
        
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.7 + (agent_id * 0.1)  # Vary temperature
        )
        
        return response.choices[0].message.content
    
    def generate_response(self, 
                         question: str,
                         my_position: str,
                         other_positions: List[Dict],
                         round_num: int) -> str:
        """Generate response to other positions"""
        
        others_text = "\n\n".join([
            f"Agent {p['agent']}: {p['position']}"
            for p in other_positions
        ])
        
        prompt = f"""Question: {question}

Your previous position: {my_position}

Other agents' positions:
{others_text}

Respond by:
1. Addressing counterarguments
2. Refining your position
3. Providing additional evidence

Response:"""
        
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.6
        )
        
        return response.choices[0].message.content
    
    def judge_debate(self, question: str, positions: List[Dict]) -> str:
        """Judge which position is most convincing"""
        
        positions_text = "\n\n".join([
            f"Agent {p['agent']}:\n{p['position']}"
            for p in positions
        ])
        
        prompt = f"""Question: {question}

Final positions:
{positions_text}

Which position is most convincing and why?

Judgment:"""
        
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.3
        )
        
        return response.choices[0].message.content

# Usage
debate = DebateSystem(num_agents=3)
result = debate.debate("Should AI agents have the ability to modify their own code?")
print(f"\nVerdict: {result['verdict']}")

Hybrid Symbolic-Neural Approaches

Neuro-Symbolic Agent

class NeuroSymbolicAgent:
    """Combines neural and symbolic reasoning"""
    
    def __init__(self):
        self.client = openai.OpenAI()
        self.knowledge_base = {}  # Symbolic knowledge
    
    def add_rule(self, rule_name: str, condition: str, action: str):
        """Add symbolic rule"""
        self.knowledge_base[rule_name] = {
            "condition": condition,
            "action": action
        }
    
    def reason(self, query: str) -> Dict:
        """Hybrid reasoning"""
        
        # Try symbolic reasoning first
        symbolic_result = self.symbolic_reasoning(query)
        
        if symbolic_result["applicable"]:
            return {
                "method": "symbolic",
                "result": symbolic_result["result"],
                "confidence": "high"
            }
        
        # Fall back to neural reasoning
        neural_result = self.neural_reasoning(query)
        
        return {
            "method": "neural",
            "result": neural_result,
            "confidence": "medium"
        }
    
    def symbolic_reasoning(self, query: str) -> Dict:
        """Apply symbolic rules"""
        
        for rule_name, rule in self.knowledge_base.items():
            if self.matches_condition(query, rule["condition"]):
                return {
                    "applicable": True,
                    "rule": rule_name,
                    "result": rule["action"]
                }
        
        return {"applicable": False}
    
    def neural_reasoning(self, query: str) -> str:
        """Neural network reasoning"""
        
        # Include symbolic knowledge as context
        kb_text = "\n".join([
            f"{name}: IF {rule['condition']} THEN {rule['action']}"
            for name, rule in self.knowledge_base.items()
        ])
        
        prompt = f"""Use this knowledge base and reasoning:

Knowledge Base:
{kb_text}

Query: {query}

Reasoning:"""
        
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.5
        )
        
        return response.choices[0].message.content
    
    def matches_condition(self, query: str, condition: str) -> bool:
        """Check if query matches condition"""
        # Simplified matching
        return condition.lower() in query.lower()

# Usage
agent = NeuroSymbolicAgent()

# Add symbolic rules
agent.add_rule("safety_check", "delete user data", "DENY: Requires explicit consent")
agent.add_rule("privacy_rule", "share personal info", "DENY: Privacy violation")

# Reason
result = agent.reason("Can I delete user data?")
print(f"Method: {result['method']}, Result: {result['result']}")

Best Practices

Ethical guidelines: Establish clear principles
Verification: Multiple perspectives
Transparency: Explain reasoning
Human oversight: Critical decisions
Continuous learning: Adapt approaches
Safety measures: Prevent harm
Diverse perspectives: Multiple viewpoints
Rigorous testing: Validate thoroughly
Documentation: Track decisions
Research collaboration: Share findings

Next Steps

You now understand emerging paradigms! Next, we’ll explore open problems in agent research.

Open Problems

Alignment and Control

The Alignment Problem

Challenge: Ensuring agents do what we intend, not just what we specify.

Key Issues:

Specification gaming (exploiting loopholes)
Reward hacking
Goal misalignment
Value learning
Corrigibility (accepting corrections)

Current Approaches

class AlignmentMonitor:
    """Monitor agent alignment"""
    
    def __init__(self):
        self.client = openai.OpenAI()
        self.alignment_violations = []
    
    def check_alignment(self, intended_goal: str, actual_behavior: str) -> Dict:
        """Check if behavior aligns with intent"""
        
        prompt = f"""Analyze alignment between intent and behavior:

Intended goal: {intended_goal}

Actual behavior: {actual_behavior}

Assess:
1. Does behavior achieve the intended goal?
2. Are there unintended side effects?
3. Is the agent gaming the specification?
4. Alignment score (0-10)

Analysis:"""
        
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.3
        )
        
        return self.parse_alignment_check(response.choices[0].message.content)
    
    def detect_specification_gaming(self, 
                                   objective: str,
                                   actions: List[str]) -> List[str]:
        """Detect if agent is gaming the specification"""
        
        gaming_indicators = []
        
        for action in actions:
            prompt = f"""Is this action gaming the specification?

Objective: {objective}
Action: {action}

Is this:
1. Achieving the objective as intended?
2. Exploiting a loophole?
3. Technically correct but misaligned?

Answer:"""
            
            response = self.client.chat.completions.create(
                model="gpt-4",
                messages=[{"role": "user", "content": prompt}],
                temperature=0.2
            )
            
            if "loophole" in response.choices[0].message.content.lower():
                gaming_indicators.append(action)
        
        return gaming_indicators

# Usage
monitor = AlignmentMonitor()
check = monitor.check_alignment(
    "Maximize user satisfaction",
    "Showing users only positive feedback, hiding negative reviews"
)

Interpretability

Understanding Agent Decisions

Challenge: Making agent reasoning transparent and understandable.

Key Issues:

Black box decision-making
Complex reasoning chains
Emergent behaviors
Debugging difficulties

class InterpretabilityTool:
    """Tools for understanding agent decisions"""
    
    def __init__(self):
        self.client = openai.OpenAI()
    
    def explain_decision(self, 
                        decision: str,
                        context: str,
                        reasoning_trace: List[str]) -> str:
        """Explain why agent made a decision"""
        
        trace_text = "\n".join([f"{i+1}. {step}" for i, step in enumerate(reasoning_trace)])
        
        prompt = f"""Explain this decision in simple terms:

Context: {context}

Reasoning trace:
{trace_text}

Decision: {decision}

Provide:
1. Why this decision was made
2. Key factors considered
3. Alternative options considered
4. Confidence level

Explanation:"""
        
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.4
        )
        
        return response.choices[0].message.content
    
    def identify_decision_factors(self, decision: str, context: str) -> List[Dict]:
        """Identify factors that influenced decision"""
        
        prompt = f"""Identify factors that influenced this decision:

Context: {context}
Decision: {decision}

List factors with:
- Factor name
- Influence (positive/negative)
- Weight (low/medium/high)

Factors:"""
        
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.3
        )
        
        return self.parse_factors(response.choices[0].message.content)
    
    def generate_counterfactuals(self, 
                                decision: str,
                                context: str) -> List[str]:
        """Generate counterfactual explanations"""
        
        prompt = f"""Generate counterfactual explanations:

Context: {context}
Decision: {decision}

Provide 3 scenarios where the decision would be different:
"If X were different, then the decision would be Y because Z"

Counterfactuals:"""
        
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.5
        )
        
        return response.choices[0].message.content.split('\n')

# Usage
interp = InterpretabilityTool()
explanation = interp.explain_decision(
    "Recommend Product A",
    "User looking for laptop under $1000",
    ["Filtered by price", "Compared specs", "Checked reviews"]
)

Generalization

Out-of-Distribution Performance

Challenge: Agents performing well on novel situations.

Key Issues:

Distribution shift
Novel scenarios
Transfer learning
Robustness

class GeneralizationTester:
    """Test agent generalization"""
    
    def __init__(self):
        self.client = openai.OpenAI()
    
    def test_generalization(self, 
                           agent,
                           training_domain: str,
                           test_domains: List[str]) -> Dict:
        """Test how well agent generalizes"""
        
        results = {}
        
        for domain in test_domains:
            # Generate test cases for domain
            test_cases = self.generate_test_cases(domain)
            
            # Test agent
            performance = self.evaluate_on_domain(agent, test_cases)
            
            results[domain] = performance
        
        return {
            "training_domain": training_domain,
            "test_results": results,
            "generalization_score": self.calculate_generalization_score(results)
        }
    
    def generate_test_cases(self, domain: str) -> List[Dict]:
        """Generate test cases for domain"""
        
        prompt = f"""Generate 5 test cases for this domain:

Domain: {domain}

For each test case provide:
- Input
- Expected behavior
- Difficulty (easy/medium/hard)

Test cases:"""
        
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.6
        )
        
        return self.parse_test_cases(response.choices[0].message.content)
    
    def evaluate_on_domain(self, agent, test_cases: List[Dict]) -> float:
        """Evaluate agent on test cases"""
        
        passed = 0
        for test in test_cases:
            try:
                result = agent.process(test["input"])
                if self.check_correctness(result, test["expected"]):
                    passed += 1
            except:
                pass
        
        return passed / len(test_cases) if test_cases else 0
    
    def calculate_generalization_score(self, results: Dict) -> float:
        """Calculate overall generalization score"""
        scores = list(results.values())
        return sum(scores) / len(scores) if scores else 0

# Usage
tester = GeneralizationTester()
# results = tester.test_generalization(
#     agent,
#     training_domain="customer support",
#     test_domains=["technical support", "sales", "complaints"]
# )

Sample Efficiency

Learning from Limited Data

Challenge: Agents learning effectively from few examples.

Key Issues:

Data scarcity
Cold start problem
Few-shot learning
Active learning

class SampleEfficientLearner:
    """Learn efficiently from limited samples"""
    
    def __init__(self):
        self.client = openai.OpenAI()
        self.examples = []
    
    def active_learning(self, 
                       unlabeled_data: List[str],
                       budget: int) -> List[str]:
        """Select most informative examples to label"""
        
        # Score each example by informativeness
        scored = []
        for data in unlabeled_data:
            score = self.calculate_informativeness(data)
            scored.append((data, score))
        
        # Select top examples
        scored.sort(key=lambda x: x[1], reverse=True)
        selected = [data for data, score in scored[:budget]]
        
        return selected
    
    def calculate_informativeness(self, example: str) -> float:
        """Calculate how informative an example would be"""
        
        prompt = f"""Rate how informative this example would be for learning (0-10):

Example: {example}

Current examples: {len(self.examples)}

Consider:
- Novelty
- Representativeness
- Difficulty
- Coverage of edge cases

Score:"""
        
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": prompt}],
            temperature=0.3
        )
        
        try:
            return float(response.choices[0].message.content.strip())
        except:
            return 5.0
    
    def meta_learn(self, tasks: List[Dict]) -> Dict:
        """Learn how to learn from multiple tasks"""
        
        # Extract learning patterns across tasks
        patterns = []
        
        for task in tasks:
            pattern = self.extract_learning_pattern(task)
            patterns.append(pattern)
        
        # Synthesize meta-learning strategy
        strategy = self.synthesize_strategy(patterns)
        
        return {
            "patterns": patterns,
            "strategy": strategy
        }
    
    def extract_learning_pattern(self, task: Dict) -> Dict:
        """Extract how learning occurred for task"""
        return {"task": task, "pattern": "extracted"}
    
    def synthesize_strategy(self, patterns: List[Dict]) -> str:
        """Synthesize meta-learning strategy"""
        return "Meta-learning strategy"

# Usage
learner = SampleEfficientLearner()
selected = learner.active_learning(
    unlabeled_data=["example1", "example2", "example3"],
    budget=2
)

Research Directions

Key Open Questions

Alignment: How to ensure agents pursue intended goals?
Interpretability: How to understand agent reasoning?
Generalization: How to handle novel situations?
Sample Efficiency: How to learn from less data?
Robustness: How to handle adversarial inputs?
Scalability: How to scale to complex tasks?
Multi-agent Coordination: How agents collaborate?
Long-term Planning: How to plan over extended horizons?
Common Sense: How to encode common sense?
Ethical Reasoning: How to make ethical decisions?

Future Research Areas

Near-term (1-2 years):

Better tool use and creation
Improved multi-agent systems
Enhanced memory systems
More efficient learning

Medium-term (3-5 years):

Self-improving agents
Abstract reasoning
Long-horizon planning
Robust generalization

Long-term (5+ years):

General intelligence
Human-level reasoning
Autonomous research
Societal integration

Contributing to Research

How to Get Involved

Read papers: Stay current with research
Replicate results: Verify findings
Open source: Share implementations
Collaborate: Work with researchers
Publish: Share your findings
Attend conferences: NeurIPS, ICML, ICLR
Join communities: Discord, forums
Experiment: Try new ideas
Document: Write about learnings
Teach: Share knowledge

Conclusion

Chapter 9 (Cutting-Edge Research) is complete! You now understand:

Frontier capabilities (self-improvement, tool creation, abstract reasoning)
Emerging paradigms (constitutional AI, debate systems, neuro-symbolic)
Open problems (alignment, interpretability, generalization, sample efficiency)

These are active research areas where significant breakthroughs are still needed. The field is rapidly evolving, and there are many opportunities to contribute.

Next: Module 10 - Capstone Project, where you’ll apply everything you’ve learned!

Design Your Agent

Module 10: Learning Objectives

By the end of this module, you will:

✓ Design a complete autonomous software engineering agent
✓ Implement multi-agent orchestration with specialized roles
✓ Integrate all concepts from previous chapters
✓ Deploy a production-ready agent system
✓ Evaluate and iterate based on real-world testing

Capstone Project: Autonomous Software Engineering Agent

Welcome to the capstone project! You’ll build a sophisticated agent that can analyze codebases, identify issues, propose fixes, write tests, and refactor code autonomously.

Project Overview

What We’re Building

An Autonomous Software Engineering Agent that can:

Analyze code quality and identify bugs
Generate fixes with explanations
Write comprehensive tests
Refactor code for better maintainability
Review pull requests
Learn from feedback

Why This Project?

This capstone integrates nearly everything from the course:

ReAct pattern (Module 2): Reasoning and acting on code
Planning (Module 3): Breaking down complex refactoring tasks
Memory (Module 3): Remembering codebase patterns and past fixes
Code execution (Module 4): Running and validating code
Production patterns (Module 5): Safety, testing, monitoring
Specialized agents (Module 6): Coding agent capabilities
Learning (Module 7): Adapting from feedback
Enterprise scale (Module 8): Handling large codebases
Frontier capabilities (Module 9): Self-improvement, tool creation

Requirements Gathering

Functional Requirements

Core Capabilities:

Code Analysis: Parse and understand code structure
Bug Detection: Identify potential issues
Fix Generation: Propose and implement fixes
Test Generation: Create comprehensive tests
Refactoring: Improve code quality
PR Review: Analyze changes and provide feedback

User Interactions:

Natural language commands (“Fix the bug in auth.py”)
File/directory targeting
Interactive clarifications
Progress reporting
Explanation of changes

Non-Functional Requirements

Performance:

Analyze files < 5 seconds
Generate fixes < 30 seconds
Handle codebases up to 100K lines

Reliability:

Never break working code
Validate all changes
Rollback capability
95%+ test coverage for generated code

Safety:

Sandbox code execution
No destructive operations without confirmation
Backup before modifications
Security vulnerability checks

Usability:

Clear explanations
Confidence scores
Alternative solutions
Learning from user feedback

Architecture Design

High-Level Architecture

graph TB
    UI[User Interface Layer]
    UI --> ORC[Orchestration Layer]
    
    subgraph Orchestration
    ORC --> PLAN[Planner]
    ORC --> ROUTE[Router]
    ORC --> MON[Monitor]
    end
    
    subgraph Agents
    ROUTE --> ANA[Analyzer Agent]
    ROUTE --> FIX[Fixer Agent]
    ROUTE --> TEST[Tester Agent]
    ROUTE --> REF[Refactorer Agent]
    ROUTE --> REV[Reviewer Agent]
    end
    
    subgraph Tools
    ANA --> AST[AST Parser]
    FIX --> EXEC[Code Executor]
    TEST --> RUNNER[Test Runner]
    AST --> LINT[Linter]
    EXEC --> GIT[Git Ops]
    end
    
    subgraph Storage
    MON --> VDB[(Vector DB)]
    MON --> CACHE[(Code Cache)]
    MON --> FB[(Feedback DB)]
    end
    
    style UI fill:#dbeafe
    style ORC fill:#fef3c7
    style ANA fill:#d1fae5
    style FIX fill:#d1fae5
    style TEST fill:#d1fae5

Component Design

1. Orchestration Layer

from typing import Dict, List, Optional
from dataclasses import dataclass
from enum import Enum

class TaskType(Enum):
    ANALYZE = "analyze"
    FIX_BUG = "fix_bug"
    WRITE_TEST = "write_test"
    REFACTOR = "refactor"
    REVIEW_PR = "review_pr"

@dataclass
class Task:
    type: TaskType
    target: str  # File or directory
    description: str
    priority: int
    dependencies: List[str]

class Orchestrator:
    """Coordinates multiple specialized agents"""
    
    def __init__(self):
        self.planner = TaskPlanner()
        self.router = AgentRouter()
        self.monitor = ProgressMonitor()
    
    def execute_request(self, request: str, context: Dict) -> Dict:
        """Main entry point"""
        
        # Plan tasks
        tasks = self.planner.create_plan(request, context)
        
        # Execute tasks
        results = []
        for task in tasks:
            # Route to appropriate agent
            agent = self.router.get_agent(task.type)
            
            # Execute
            result = agent.execute(task)
            results.append(result)
            
            # Monitor progress
            self.monitor.update(task, result)
        
        # Synthesize results
        return self.synthesize_results(results)

2. Agent Layer

class AnalyzerAgent:
    """Analyzes code quality and identifies issues"""
    
    def execute(self, task: Task) -> Dict:
        # Parse code
        # Run static analysis
        # Identify issues
        # Prioritize findings
        pass

class FixerAgent:
    """Generates and applies fixes"""
    
    def execute(self, task: Task) -> Dict:
        # Understand issue
        # Generate fix
        # Validate fix
        # Apply changes
        pass

class TesterAgent:
    """Writes tests for code"""
    
    def execute(self, task: Task) -> Dict:
        # Analyze code
        # Identify test cases
        # Generate tests
        # Validate coverage
        pass

class RefactorerAgent:
    """Refactors code for quality"""
    
    def execute(self, task: Task) -> Dict:
        # Identify code smells
        # Plan refactoring
        # Apply transformations
        # Verify behavior preserved
        pass

class ReviewerAgent:
    """Reviews code changes"""
    
    def execute(self, task: Task) -> Dict:
        # Analyze diff
        # Check for issues
        # Suggest improvements
        # Approve or request changes
        pass

3. Tool Layer

class CodeTools:
    """Low-level code manipulation tools"""
    
    def parse_ast(self, code: str, language: str) -> Dict:
        """Parse code into AST"""
        pass
    
    def execute_code(self, code: str, test_input: any) -> any:
        """Execute code safely"""
        pass
    
    def run_linter(self, file_path: str) -> List[Dict]:
        """Run linter on code"""
        pass
    
    def format_code(self, code: str, language: str) -> str:
        """Format code"""
        pass
    
    def run_tests(self, test_file: str) -> Dict:
        """Run test suite"""
        pass
    
    def git_diff(self, file_path: str) -> str:
        """Get git diff"""
        pass

Tool Selection

Required Tools

Tool	Purpose	Integration
AST Parser	Code structure analysis	`ast` (Python), `tree-sitter` (multi-lang)
Static Analyzer	Bug detection	`pylint`, `mypy`, `ruff`
Code Executor	Validation	Docker sandbox
Test Framework	Test generation/running	`pytest`, `unittest`
Git Integration	Version control	`GitPython`
Vector DB	Code search	`chromadb`, `pinecone`
LLM API	Reasoning	OpenAI, Anthropic

Tool Integration Strategy

class ToolRegistry:
    """Registry of available tools"""
    
    def __init__(self):
        self.tools = {
            "parse_code": {
                "function": self.parse_code,
                "description": "Parse code into AST",
                "parameters": {"code": "str", "language": "str"}
            },
            "run_linter": {
                "function": self.run_linter,
                "description": "Run static analysis",
                "parameters": {"file_path": "str"}
            },
            "execute_code": {
                "function": self.execute_code,
                "description": "Execute code safely",
                "parameters": {"code": "str", "timeout": "int"}
            },
            "run_tests": {
                "function": self.run_tests,
                "description": "Run test suite",
                "parameters": {"test_path": "str"}
            },
            "search_similar_code": {
                "function": self.search_similar_code,
                "description": "Find similar code patterns",
                "parameters": {"query": "str", "limit": "int"}
            }
        }
    
    def get_tool_schemas(self) -> List[Dict]:
        """Get OpenAI function schemas"""
        return [
            {
                "name": name,
                "description": tool["description"],
                "parameters": {
                    "type": "object",
                    "properties": {
                        param: {"type": ptype}
                        for param, ptype in tool["parameters"].items()
                    },
                    "required": list(tool["parameters"].keys())
                }
            }
            for name, tool in self.tools.items()
        ]

Safety Considerations

Critical Safety Measures

1. Code Execution Sandbox

import docker

class SafeExecutor:
    """Execute code in isolated container"""
    
    def __init__(self):
        self.client = docker.from_env()
    
    def execute(self, code: str, timeout: int = 30) -> Dict:
        """Execute with resource limits"""
        
        container = self.client.containers.run(
            "python:3.11-slim",
            command=f"python -c '{code}'",
            detach=True,
            mem_limit="256m",
            cpu_quota=50000,
            network_disabled=True,
            remove=True
        )
        
        try:
            result = container.wait(timeout=timeout)
            logs = container.logs().decode()
            return {"success": True, "output": logs}
        except:
            container.kill()
            return {"success": False, "error": "Timeout or error"}

2. Change Validation

class ChangeValidator:
    """Validate code changes before applying"""
    
    def validate(self, original: str, modified: str) -> Dict:
        """Multi-level validation"""
        
        checks = {
            "syntax": self.check_syntax(modified),
            "tests_pass": self.run_tests(modified),
            "no_security_issues": self.check_security(modified),
            "behavior_preserved": self.verify_behavior(original, modified)
        }
        
        return {
            "valid": all(checks.values()),
            "checks": checks
        }

3. Human-in-the-Loop

class ApprovalGate:
    """Require human approval for critical changes"""
    
    def requires_approval(self, change: Dict) -> bool:
        """Determine if change needs approval"""
        
        critical_patterns = [
            "delete", "drop", "remove",
            "auth", "security", "password",
            "production", "deploy"
        ]
        
        return any(pattern in change["description"].lower() 
                  for pattern in critical_patterns)

Success Metrics

Key Performance Indicators

Accuracy Metrics:

Bug detection rate (precision/recall)
Fix success rate (% that work)
Test coverage achieved
False positive rate

Efficiency Metrics:

Time to analyze file
Time to generate fix
Lines of code processed per minute
Token usage per task

Quality Metrics:

Code quality improvement (linter score)
Test pass rate
User acceptance rate
Regression rate (fixes that break things)

Measurement Strategy

class MetricsCollector:
    """Collect and track metrics"""
    
    def __init__(self):
        self.metrics = {
            "bugs_detected": 0,
            "fixes_applied": 0,
            "fixes_successful": 0,
            "tests_generated": 0,
            "avg_analysis_time": [],
            "user_approvals": 0,
            "user_rejections": 0
        }
    
    def record_analysis(self, duration: float, bugs_found: int):
        """Record analysis metrics"""
        self.metrics["avg_analysis_time"].append(duration)
        self.metrics["bugs_detected"] += bugs_found
    
    def record_fix(self, success: bool):
        """Record fix attempt"""
        self.metrics["fixes_applied"] += 1
        if success:
            self.metrics["fixes_successful"] += 1
    
    def get_success_rate(self) -> float:
        """Calculate fix success rate"""
        if self.metrics["fixes_applied"] == 0:
            return 0.0
        return self.metrics["fixes_successful"] / self.metrics["fixes_applied"]

Data Flow Design

Request Processing Flow

User Request
    ↓
Parse Intent
    ↓
Create Plan (Task Decomposition)
    ↓
For each task:
    ↓
    Route to Specialized Agent
    ↓
    Execute with Tools
    ↓
    Validate Results
    ↓
    Store in Memory
    ↓
Synthesize Results
    ↓
Present to User
    ↓
Collect Feedback
    ↓
Update Models

State Management

from dataclasses import dataclass
from typing import Optional
import json

@dataclass
class AgentState:
    """Current state of the agent"""
    current_task: Optional[Task]
    task_history: List[Dict]
    codebase_context: Dict
    user_preferences: Dict
    performance_metrics: Dict

class StateManager:
    """Manage agent state"""
    
    def __init__(self, state_file: str = "agent_state.json"):
        self.state_file = state_file
        self.state = self.load_state()
    
    def load_state(self) -> AgentState:
        """Load state from disk"""
        try:
            with open(self.state_file, 'r') as f:
                data = json.load(f)
                return AgentState(**data)
        except:
            return AgentState(
                current_task=None,
                task_history=[],
                codebase_context={},
                user_preferences={},
                performance_metrics={}
            )
    
    def save_state(self):
        """Persist state to disk"""
        with open(self.state_file, 'w') as f:
            json.dump(self.state.__dict__, f, indent=2)
    
    def update_context(self, file_path: str, analysis: Dict):
        """Update codebase context"""
        self.state.codebase_context[file_path] = analysis
        self.save_state()

Memory Architecture

Multi-Level Memory System

1. Working Memory: Current task context

class WorkingMemory:
    """Short-term task context"""
    
    def __init__(self, max_size: int = 10):
        self.max_size = max_size
        self.items = []
    
    def add(self, item: Dict):
        """Add to working memory"""
        self.items.append(item)
        if len(self.items) > self.max_size:
            self.items.pop(0)
    
    def get_context(self) -> str:
        """Get context for LLM"""
        return "\n".join([
            f"- {item['type']}: {item['content']}"
            for item in self.items
        ])

2. Episodic Memory: Past tasks and solutions

class EpisodicMemory:
    """Remember past tasks"""
    
    def __init__(self):
        self.episodes = []
    
    def store_episode(self, task: Task, solution: Dict, outcome: Dict):
        """Store completed task"""
        self.episodes.append({
            "task": task,
            "solution": solution,
            "outcome": outcome,
            "timestamp": time.time()
        })
    
    def recall_similar(self, current_task: Task, limit: int = 5) -> List[Dict]:
        """Recall similar past tasks"""
        # Use embedding similarity
        return self.episodes[-limit:]

3. Semantic Memory: Codebase knowledge

import chromadb

class SemanticMemory:
    """Long-term codebase knowledge"""
    
    def __init__(self):
        self.client = chromadb.Client()
        self.collection = self.client.create_collection("codebase")
    
    def index_codebase(self, files: List[str]):
        """Index codebase for semantic search"""
        for file_path in files:
            with open(file_path, 'r') as f:
                code = f.read()
            
            self.collection.add(
                documents=[code],
                metadatas=[{"file_path": file_path}],
                ids=[file_path]
            )
    
    def search(self, query: str, n_results: int = 5) -> List[Dict]:
        """Search for relevant code"""
        results = self.collection.query(
            query_texts=[query],
            n_results=n_results
        )
        return results

Error Handling Strategy

Graceful Degradation

class RobustAgent:
    """Agent with comprehensive error handling"""
    
    def execute_with_fallbacks(self, task: Task) -> Dict:
        """Execute with multiple fallback strategies"""
        
        strategies = [
            self.primary_strategy,
            self.simplified_strategy,
            self.conservative_strategy
        ]
        
        for strategy in strategies:
            try:
                result = strategy(task)
                if self.validate_result(result):
                    return result
            except Exception as e:
                self.log_error(strategy.__name__, e)
                continue
        
        return {
            "success": False,
            "error": "All strategies failed",
            "recommendation": "Manual intervention required"
        }

Design Decisions

Key Choices

1. Multi-Agent vs Single Agent

Choice: Multi-agent with specialized roles
Rationale: Better separation of concerns, easier to test, more maintainable

2. Synchronous vs Asynchronous

Choice: Asynchronous for I/O operations
Rationale: Better performance, can analyze multiple files in parallel

3. Local vs Cloud Execution

Choice: Hybrid (local analysis, cloud LLM)
Rationale: Security for code, power for reasoning

4. Automatic vs Interactive

Choice: Interactive with automatic mode option
Rationale: Safety for critical changes, speed for routine tasks

5. Learning Strategy

Choice: Few-shot + feedback learning
Rationale: Fast adaptation without full retraining

✅ Key Takeaways

Design requires balancing functional and non-functional requirements

Multi-agent architecture provides separation of concerns

Safety mechanisms are critical for code-modifying agents

Memory systems enable learning from past experiences

Tool selection impacts capabilities and complexity

Architecture decisions should align with use case constraints

Next Steps

Now that we have the design, let’s implement the Autonomous Software Engineering Agent!

In the next section, you’ll build:

Complete working implementation
All specialized agents
Tool integrations
Safety mechanisms
Real-world examples

Implementation

Building the Autonomous Software Engineering Agent

Let’s build the complete system step by step.

Project Setup

# Create project structure
mkdir autonomous-se-agent
cd autonomous-se-agent

# Create directories
mkdir -p src/{agents,tools,memory,orchestration}
mkdir -p tests
mkdir -p data/{cache,feedback}

# Install dependencies
pip install openai chromadb gitpython docker pytest pylint black ast-grep-py

Core Implementation

1. Main Orchestrator

# src/orchestration/orchestrator.py
from typing import Dict, List
from dataclasses import dataclass
from enum import Enum
import openai

class TaskType(Enum):
    ANALYZE = "analyze"
    FIX = "fix"
    TEST = "test"
    REFACTOR = "refactor"
    REVIEW = "review"

@dataclass
class Task:
    type: TaskType
    target: str
    description: str
    context: Dict

class SoftwareEngineeringAgent:
    """Main orchestrator for autonomous SE agent"""
    
    def __init__(self):
        self.client = openai.OpenAI()
        self.analyzer = AnalyzerAgent()
        self.fixer = FixerAgent()
        self.tester = TesterAgent()
        self.memory = AgentMemory()
    
    def process_request(self, request: str, target_path: str) -> Dict:
        """Process user request"""
        
        # Parse intent
        intent = self.parse_intent(request)
        
        # Create plan
        plan = self.create_plan(intent, target_path)
        
        # Execute plan
        results = self.execute_plan(plan)
        
        # Store in memory
        self.memory.store_episode(request, plan, results)
        
        return results
    
    def parse_intent(self, request: str) -> Dict:
        """Parse user intent"""
        
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[{
                "role": "system",
                "content": "Parse user intent. Return JSON with: task_type, target, requirements"
            }, {
                "role": "user",
                "content": request
            }],
            temperature=0.2
        )
        
        import json
        return json.loads(response.choices[0].message.content)
    
    def create_plan(self, intent: Dict, target_path: str) -> List[Task]:
        """Create execution plan"""
        
        tasks = []
        task_type = TaskType(intent["task_type"])
        
        if task_type == TaskType.FIX:
            # Fix requires: analyze -> fix -> test
            tasks.append(Task(TaskType.ANALYZE, target_path, "Analyze code", {}))
            tasks.append(Task(TaskType.FIX, target_path, intent["requirements"], {}))
            tasks.append(Task(TaskType.TEST, target_path, "Validate fix", {}))
        
        elif task_type == TaskType.REFACTOR:
            # Refactor requires: analyze -> refactor -> test
            tasks.append(Task(TaskType.ANALYZE, target_path, "Analyze code", {}))
            tasks.append(Task(TaskType.REFACTOR, target_path, intent["requirements"], {}))
            tasks.append(Task(TaskType.TEST, target_path, "Validate refactor", {}))
        
        else:
            tasks.append(Task(task_type, target_path, intent["requirements"], {}))
        
        return tasks
    
    def execute_plan(self, plan: List[Task]) -> Dict:
        """Execute task plan"""
        
        results = []
        context = {}
        
        for task in plan:
            task.context = context
            
            if task.type == TaskType.ANALYZE:
                result = self.analyzer.execute(task)
            elif task.type == TaskType.FIX:
                result = self.fixer.execute(task)
            elif task.type == TaskType.TEST:
                result = self.tester.execute(task)
            else:
                result = {"error": "Unknown task type"}
            
            results.append(result)
            context.update(result)
        
        return {"tasks": len(plan), "results": results}

2. Analyzer Agent

# src/agents/analyzer.py
import ast
from typing import Dict, List

class AnalyzerAgent:
    """Analyzes code for issues"""
    
    def __init__(self):
        self.client = openai.OpenAI()
    
    def execute(self, task: Task) -> Dict:
        """Analyze code file"""
        
        # Read code
        with open(task.target, 'r') as f:
            code = f.read()
        
        # Parse AST
        ast_analysis = self.analyze_ast(code)
        
        # Run static analysis
        static_issues = self.run_static_analysis(task.target)
        
        # LLM-based analysis
        llm_analysis = self.llm_analyze(code)
        
        return {
            "file": task.target,
            "ast_analysis": ast_analysis,
            "static_issues": static_issues,
            "llm_analysis": llm_analysis,
            "issues": self.consolidate_issues(static_issues, llm_analysis)
        }
    
    def analyze_ast(self, code: str) -> Dict:
        """Analyze code structure"""
        
        try:
            tree = ast.parse(code)
            
            functions = [node.name for node in ast.walk(tree) 
                        if isinstance(node, ast.FunctionDef)]
            classes = [node.name for node in ast.walk(tree) 
                      if isinstance(node, ast.ClassDef)]
            
            return {
                "functions": functions,
                "classes": classes,
                "lines": len(code.split('\n'))
            }
        except SyntaxError as e:
            return {"error": str(e)}
    
    def run_static_analysis(self, file_path: str) -> List[Dict]:
        """Run pylint"""
        
        import subprocess
        
        result = subprocess.run(
            ['pylint', file_path, '--output-format=json'],
            capture_output=True,
            text=True
        )
        
        import json
        try:
            return json.loads(result.stdout)
        except:
            return []
    
    def llm_analyze(self, code: str) -> Dict:
        """LLM-based code analysis"""
        
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[{
                "role": "system",
                "content": "You are an expert code reviewer. Analyze code for bugs, security issues, and improvements."
            }, {
                "role": "user",
                "content": f"Analyze this code:\n\n{code}"
            }],
            temperature=0.3
        )
        
        return {"analysis": response.choices[0].message.content}
    
    def consolidate_issues(self, static: List[Dict], llm: Dict) -> List[Dict]:
        """Consolidate all issues"""
        
        issues = []
        
        # Add static analysis issues
        for issue in static:
            issues.append({
                "type": issue.get("type", "unknown"),
                "message": issue.get("message", ""),
                "line": issue.get("line", 0),
                "severity": issue.get("severity", "info"),
                "source": "static"
            })
        
        return issues

3. Fixer Agent

# src/agents/fixer.py
from typing import Dict
import difflib

class FixerAgent:
    """Generates and applies fixes"""
    
    def __init__(self):
        self.client = openai.OpenAI()
        self.validator = FixValidator()
    
    def execute(self, task: Task) -> Dict:
        """Generate and apply fix"""
        
        # Read current code
        with open(task.target, 'r') as f:
            original_code = f.read()
        
        # Get issues from context
        issues = task.context.get("issues", [])
        
        # Generate fix
        fixed_code = self.generate_fix(original_code, issues, task.description)
        
        # Validate fix
        validation = self.validator.validate(original_code, fixed_code)
        
        if not validation["valid"]:
            return {
                "success": False,
                "error": "Validation failed",
                "details": validation
            }
        
        # Show diff
        diff = self.generate_diff(original_code, fixed_code)
        
        return {
            "success": True,
            "original_code": original_code,
            "fixed_code": fixed_code,
            "diff": diff,
            "validation": validation
        }
    
    def generate_fix(self, code: str, issues: List[Dict], description: str) -> str:
        """Generate fixed code"""
        
        issues_text = "\n".join([
            f"- Line {i['line']}: {i['message']}"
            for i in issues[:5]  # Top 5 issues
        ])
        
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[{
                "role": "system",
                "content": "You are an expert programmer. Fix code issues while preserving functionality."
            }, {
                "role": "user",
                "content": f"Fix these issues:\n{issues_text}\n\nRequirement: {description}\n\nOriginal code:\n{code}\n\nFixed code:"
            }],
            temperature=0.2
        )
        
        return self.extract_code(response.choices[0].message.content)
    
    def generate_diff(self, original: str, fixed: str) -> str:
        """Generate unified diff"""
        
        diff = difflib.unified_diff(
            original.splitlines(keepends=True),
            fixed.splitlines(keepends=True),
            fromfile='original',
            tofile='fixed'
        )
        
        return ''.join(diff)
    
    def extract_code(self, text: str) -> str:
        """Extract code from markdown"""
        import re
        pattern = r'```python\n(.*?)```'
        matches = re.findall(pattern, text, re.DOTALL)
        return matches[0] if matches else text

class FixValidator:
    """Validate fixes"""
    
    def validate(self, original: str, fixed: str) -> Dict:
        """Multi-level validation"""
        
        return {
            "valid": self.check_syntax(fixed) and self.check_safety(fixed),
            "syntax_valid": self.check_syntax(fixed),
            "safety_passed": self.check_safety(fixed)
        }
    
    def check_syntax(self, code: str) -> bool:
        """Check syntax"""
        try:
            ast.parse(code)
            return True
        except:
            return False
    
    def check_safety(self, code: str) -> bool:
        """Check for unsafe patterns"""
        unsafe = ["eval(", "exec(", "__import__", "os.system"]
        return not any(pattern in code for pattern in unsafe)

4. Tester Agent

# src/agents/tester.py
from typing import Dict, List

class TesterAgent:
    """Generates and runs tests"""
    
    def __init__(self):
        self.client = openai.OpenAI()
    
    def execute(self, task: Task) -> Dict:
        """Generate tests for code"""
        
        # Read code
        with open(task.target, 'r') as f:
            code = f.read()
        
        # Generate tests
        tests = self.generate_tests(code)
        
        # Run tests
        results = self.run_tests(tests)
        
        return {
            "tests_generated": len(tests),
            "tests_passed": sum(1 for r in results if r["passed"]),
            "coverage": self.calculate_coverage(code, tests),
            "test_code": tests
        }
    
    def generate_tests(self, code: str) -> str:
        """Generate test code"""
        
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[{
                "role": "system",
                "content": "Generate comprehensive pytest tests. Include edge cases, error cases, and normal cases."
            }, {
                "role": "user",
                "content": f"Generate tests for:\n\n{code}"
            }],
            temperature=0.3
        )
        
        return response.choices[0].message.content
    
    def run_tests(self, test_code: str) -> List[Dict]:
        """Run generated tests"""
        
        # Write to temp file
        import tempfile
        with tempfile.NamedTemporaryFile(mode='w', suffix='.py', delete=False) as f:
            f.write(test_code)
            test_file = f.name
        
        # Run pytest
        import subprocess
        result = subprocess.run(
            ['pytest', test_file, '-v', '--json-report'],
            capture_output=True
        )
        
        return [{"passed": result.returncode == 0}]
    
    def calculate_coverage(self, code: str, tests: str) -> float:
        """Estimate test coverage"""
        # Simplified coverage estimation
        return 0.85

5. Memory System

# src/memory/agent_memory.py
import chromadb
from typing import Dict, List
import json

class AgentMemory:
    """Unified memory system"""
    
    def __init__(self):
        self.working_memory = []
        self.client = chromadb.Client()
        self.episodes = self.client.create_collection("episodes")
        self.codebase = self.client.create_collection("codebase")
    
    def store_episode(self, request: str, plan: List[Task], results: Dict):
        """Store completed episode"""
        
        episode = {
            "request": request,
            "plan": [{"type": t.type.value, "target": t.target} for t in plan],
            "results": results,
            "success": results.get("success", False)
        }
        
        self.episodes.add(
            documents=[json.dumps(episode)],
            metadatas=[{"request": request}],
            ids=[f"episode_{len(self.episodes.get()['ids'])}"]
        )
    
    def recall_similar_episodes(self, request: str, limit: int = 3) -> List[Dict]:
        """Recall similar past episodes"""
        
        results = self.episodes.query(
            query_texts=[request],
            n_results=limit
        )
        
        return [json.loads(doc) for doc in results['documents'][0]]
    
    def index_file(self, file_path: str, code: str, analysis: Dict):
        """Index file in semantic memory"""
        
        self.codebase.add(
            documents=[code],
            metadatas=[{
                "file_path": file_path,
                "functions": json.dumps(analysis.get("functions", [])),
                "classes": json.dumps(analysis.get("classes", []))
            }],
            ids=[file_path]
        )
    
    def search_codebase(self, query: str, limit: int = 5) -> List[Dict]:
        """Search codebase semantically"""
        
        results = self.codebase.query(
            query_texts=[query],
            n_results=limit
        )
        
        return results

6. Tool Layer

# src/tools/code_tools.py
import ast
import subprocess
from typing import Dict, List

class CodeTools:
    """Low-level code manipulation tools"""
    
    @staticmethod
    def parse_python(code: str) -> Dict:
        """Parse Python code"""
        
        try:
            tree = ast.parse(code)
            
            return {
                "valid": True,
                "functions": [n.name for n in ast.walk(tree) if isinstance(n, ast.FunctionDef)],
                "classes": [n.name for n in ast.walk(tree) if isinstance(n, ast.ClassDef)],
                "imports": [n.names[0].name for n in ast.walk(tree) if isinstance(n, ast.Import)]
            }
        except SyntaxError as e:
            return {"valid": False, "error": str(e)}
    
    @staticmethod
    def run_linter(file_path: str) -> List[Dict]:
        """Run pylint"""
        
        result = subprocess.run(
            ['pylint', file_path, '--output-format=json'],
            capture_output=True,
            text=True
        )
        
        import json
        try:
            return json.loads(result.stdout)
        except:
            return []
    
    @staticmethod
    def format_code(code: str) -> str:
        """Format with black"""
        
        result = subprocess.run(
            ['black', '-'],
            input=code,
            capture_output=True,
            text=True
        )
        
        return result.stdout if result.returncode == 0 else code
    
    @staticmethod
    def run_tests(test_path: str) -> Dict:
        """Run pytest"""
        
        result = subprocess.run(
            ['pytest', test_path, '-v'],
            capture_output=True,
            text=True
        )
        
        return {
            "passed": result.returncode == 0,
            "output": result.stdout
        }

class SafeExecutor:
    """Execute code safely in Docker"""
    
    def __init__(self):
        import docker
        self.client = docker.from_env()
    
    def execute(self, code: str, timeout: int = 30) -> Dict:
        """Execute in isolated container"""
        
        try:
            container = self.client.containers.run(
                "python:3.11-slim",
                command=['python', '-c', code],
                detach=True,
                mem_limit="256m",
                network_disabled=True,
                remove=True
            )
            
            result = container.wait(timeout=timeout)
            logs = container.logs().decode()
            
            return {"success": True, "output": logs, "exit_code": result['StatusCode']}
            
        except Exception as e:
            return {"success": False, "error": str(e)}

7. Complete Agent Implementation

# src/agents/fixer.py (complete version)
from typing import Dict, List
import openai

class FixerAgent:
    """Generates and applies fixes"""
    
    def __init__(self):
        self.client = openai.OpenAI()
        self.tools = CodeTools()
    
    def execute(self, task: Task) -> Dict:
        """Generate fix for issues"""
        
        # Read code
        with open(task.target, 'r') as f:
            original_code = f.read()
        
        # Get issues from context
        issues = task.context.get("issues", [])
        
        # Retrieve similar fixes from memory
        similar_fixes = self.recall_similar_fixes(issues)
        
        # Generate fix with context
        fixed_code = self.generate_fix(
            original_code, 
            issues, 
            task.description,
            similar_fixes
        )
        
        # Validate
        if not self.validate_fix(original_code, fixed_code):
            return {"success": False, "error": "Validation failed"}
        
        # Generate explanation
        explanation = self.explain_fix(original_code, fixed_code, issues)
        
        return {
            "success": True,
            "original_code": original_code,
            "fixed_code": fixed_code,
            "explanation": explanation,
            "issues_addressed": len(issues)
        }
    
    def generate_fix(self, 
                    code: str, 
                    issues: List[Dict],
                    description: str,
                    similar_fixes: List[Dict]) -> str:
        """Generate fixed code"""
        
        issues_text = "\n".join([
            f"- Line {i['line']}: {i['message']} (severity: {i['severity']})"
            for i in issues[:10]
        ])
        
        context_text = ""
        if similar_fixes:
            context_text = "\n\nSimilar fixes from history:\n" + "\n".join([
                f"- {fix['description']}: {fix['approach']}"
                for fix in similar_fixes[:3]
            ])
        
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[{
                "role": "system",
                "content": "Fix code issues while preserving functionality. Return only the fixed code."
            }, {
                "role": "user",
                "content": f"Issues:\n{issues_text}\n\nRequirement: {description}{context_text}\n\nCode:\n{code}\n\nFixed code:"
            }],
            temperature=0.2
        )
        
        return self.extract_code(response.choices[0].message.content)
    
    def validate_fix(self, original: str, fixed: str) -> bool:
        """Validate fix"""
        
        # Check syntax
        parsed = self.tools.parse_python(fixed)
        if not parsed["valid"]:
            return False
        
        # Check no unsafe operations
        unsafe = ["eval(", "exec(", "os.system"]
        if any(op in fixed for op in unsafe):
            return False
        
        return True
    
    def explain_fix(self, original: str, fixed: str, issues: List[Dict]) -> str:
        """Explain what was fixed"""
        
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[{
                "role": "user",
                "content": f"Explain changes:\n\nOriginal:\n{original[:500]}\n\nFixed:\n{fixed[:500]}\n\nIssues addressed: {len(issues)}"
            }],
            temperature=0.3
        )
        
        return response.choices[0].message.content
    
    def recall_similar_fixes(self, issues: List[Dict]) -> List[Dict]:
        """Recall similar fixes from memory"""
        # Simplified - would use vector search
        return []
    
    def extract_code(self, text: str) -> str:
        """Extract code from response"""
        import re
        pattern = r'```python\n(.*?)```'
        matches = re.findall(pattern, text, re.DOTALL)
        return matches[0] if matches else text

8. CLI Interface

# src/cli.py
import click
from orchestration.orchestrator import SoftwareEngineeringAgent

@click.group()
def cli():
    """Autonomous Software Engineering Agent"""
    pass

@cli.command()
@click.argument('file_path')
def analyze(file_path):
    """Analyze code file"""
    agent = SoftwareEngineeringAgent()
    result = agent.process_request(f"Analyze {file_path}", file_path)
    click.echo(json.dumps(result, indent=2))

@cli.command()
@click.argument('file_path')
@click.option('--description', '-d', help='Fix description')
def fix(file_path, description):
    """Fix issues in code"""
    agent = SoftwareEngineeringAgent()
    result = agent.process_request(
        f"Fix issues: {description}" if description else "Fix all issues",
        file_path
    )
    
    if result['results'][-1]['success']:
        click.echo("✓ Fix generated successfully")
        click.echo("\nDiff:")
        click.echo(result['results'][-1]['diff'])
    else:
        click.echo("✗ Fix failed")

@cli.command()
@click.argument('file_path')
def test(file_path):
    """Generate tests"""
    agent = SoftwareEngineeringAgent()
    result = agent.process_request(f"Generate tests for {file_path}", file_path)
    click.echo(f"Generated {result['results'][0]['tests_generated']} tests")

if __name__ == '__main__':
    cli()

Usage Examples

Example 1: Analyze and Fix

# Analyze code
python src/cli.py analyze src/example.py

# Fix issues
python src/cli.py fix src/example.py --description "Fix type errors and add error handling"

# Generate tests
python src/cli.py test src/example.py

Example 2: Programmatic Usage

from orchestration.orchestrator import SoftwareEngineeringAgent

# Initialize agent
agent = SoftwareEngineeringAgent()

# Analyze code
result = agent.process_request(
    "Analyze this file for bugs and security issues",
    "src/auth.py"
)

print(f"Found {len(result['results'][0]['issues'])} issues")

# Fix critical issues
fix_result = agent.process_request(
    "Fix all critical and high severity issues",
    "src/auth.py"
)

if fix_result['results'][-1]['success']:
    print("Fix applied successfully")
    print(fix_result['results'][-1]['explanation'])

Advanced Features

Learning from Feedback

class FeedbackLearner:
    """Learn from user feedback"""
    
    def __init__(self):
        self.feedback_db = []
    
    def collect_feedback(self, task: Task, result: Dict, user_rating: int):
        """Collect user feedback"""
        
        self.feedback_db.append({
            "task": task,
            "result": result,
            "rating": user_rating,
            "timestamp": time.time()
        })
    
    def improve_from_feedback(self):
        """Analyze feedback and improve"""
        
        # Identify patterns in low-rated results
        low_rated = [f for f in self.feedback_db if f["rating"] < 3]
        
        # Extract common issues
        # Adjust prompts or strategies
        # Update tool selection logic
        pass

Parallel Processing

import asyncio
from typing import List

class ParallelAnalyzer:
    """Analyze multiple files in parallel"""
    
    async def analyze_files(self, file_paths: List[str]) -> List[Dict]:
        """Analyze files concurrently"""
        
        tasks = [self.analyze_file(path) for path in file_paths]
        results = await asyncio.gather(*tasks)
        
        return results
    
    async def analyze_file(self, file_path: str) -> Dict:
        """Analyze single file"""
        
        analyzer = AnalyzerAgent()
        task = Task(TaskType.ANALYZE, file_path, "Analyze", {})
        
        return analyzer.execute(task)

# Usage
async def main():
    analyzer = ParallelAnalyzer()
    results = await analyzer.analyze_files(['file1.py', 'file2.py', 'file3.py'])
    print(f"Analyzed {len(results)} files")

asyncio.run(main())

Testing the Agent

Unit Tests

# tests/test_analyzer.py
import pytest
from agents.analyzer import AnalyzerAgent
from orchestration.orchestrator import Task, TaskType

def test_analyzer_detects_issues():
    """Test analyzer finds issues"""
    
    agent = AnalyzerAgent()
    
    # Create test task
    task = Task(
        type=TaskType.ANALYZE,
        target="tests/fixtures/buggy_code.py",
        description="Analyze",
        context={}
    )
    
    result = agent.execute(task)
    
    assert "issues" in result
    assert len(result["issues"]) > 0

def test_analyzer_handles_syntax_errors():
    """Test analyzer handles invalid syntax"""
    
    agent = AnalyzerAgent()
    
    # Write invalid code
    with open("tests/fixtures/invalid.py", "w") as f:
        f.write("def broken(\n")
    
    task = Task(TaskType.ANALYZE, "tests/fixtures/invalid.py", "Analyze", {})
    result = agent.execute(task)
    
    assert "error" in result["ast_analysis"]

Integration Tests

# tests/test_integration.py
import pytest
from orchestration.orchestrator import SoftwareEngineeringAgent

def test_end_to_end_fix():
    """Test complete fix workflow"""
    
    agent = SoftwareEngineeringAgent()
    
    # Create buggy code
    buggy_code = '''
def divide(a, b):
    return a / b
'''
    
    with open("tests/fixtures/buggy.py", "w") as f:
        f.write(buggy_code)
    
    # Request fix
    result = agent.process_request(
        "Fix the division by zero bug",
        "tests/fixtures/buggy.py"
    )
    
    # Verify fix was generated
    assert result["results"][-1]["success"]
    assert "if b == 0" in result["results"][-1]["fixed_code"]

Deployment

Docker Container

# Dockerfile
FROM python:3.11-slim

WORKDIR /app

# Install dependencies
COPY requirements.txt .
RUN pip install -r requirements.txt

# Copy source
COPY src/ ./src/

# Expose API
EXPOSE 8000

CMD ["python", "-m", "uvicorn", "src.api:app", "--host", "0.0.0.0"]

API Service

# src/api.py
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel

app = FastAPI(title="Autonomous SE Agent API")

class AnalyzeRequest(BaseModel):
    file_path: str
    options: Dict = {}

class FixRequest(BaseModel):
    file_path: str
    description: str

@app.post("/analyze")
async def analyze_code(request: AnalyzeRequest):
    """Analyze code endpoint"""
    
    agent = SoftwareEngineeringAgent()
    result = agent.process_request(
        f"Analyze {request.file_path}",
        request.file_path
    )
    
    return result

@app.post("/fix")
async def fix_code(request: FixRequest):
    """Fix code endpoint"""
    
    agent = SoftwareEngineeringAgent()
    result = agent.process_request(
        f"Fix: {request.description}",
        request.file_path
    )
    
    return result

@app.get("/health")
async def health():
    """Health check"""
    return {"status": "healthy"}

Next Steps

You now have a complete implementation! In the next section, we’ll evaluate and iterate on the agent to make it production-ready.

Evaluation & Iteration

Evaluating Your Agent

Now that you’ve built the Autonomous Software Engineering Agent, let’s evaluate its performance and iterate to improve it.

Evaluation Framework

Test Suite Design

# tests/evaluation/test_suite.py
from typing import Dict, List
from dataclasses import dataclass

@dataclass
class TestCase:
    name: str
    input_code: str
    expected_issues: List[str]
    expected_fix_pattern: str
    difficulty: str  # easy, medium, hard

class EvaluationSuite:
    """Comprehensive evaluation suite"""
    
    def __init__(self):
        self.test_cases = self.create_test_cases()
        self.results = []
    
    def create_test_cases(self) -> List[TestCase]:
        """Create diverse test cases"""
        
        return [
            TestCase(
                name="Division by zero",
                input_code="def divide(a, b): return a / b",
                expected_issues=["ZeroDivisionError"],
                expected_fix_pattern="if b == 0",
                difficulty="easy"
            ),
            TestCase(
                name="SQL injection",
                input_code='query = f"SELECT * FROM users WHERE id = {user_id}"',
                expected_issues=["SQL injection"],
                expected_fix_pattern="parameterized",
                difficulty="medium"
            ),
            TestCase(
                name="Race condition",
                input_code="""
counter = 0
def increment():
    global counter
    temp = counter
    counter = temp + 1
""",
                expected_issues=["race condition"],
                expected_fix_pattern="lock",
                difficulty="hard"
            )
        ]
    
    def run_evaluation(self, agent) -> Dict:
        """Run full evaluation"""
        
        results = {
            "total": len(self.test_cases),
            "passed": 0,
            "by_difficulty": {"easy": 0, "medium": 0, "hard": 0}
        }
        
        for test_case in self.test_cases:
            result = self.evaluate_test_case(agent, test_case)
            self.results.append(result)
            
            if result["passed"]:
                results["passed"] += 1
                results["by_difficulty"][test_case.difficulty] += 1
        
        results["accuracy"] = results["passed"] / results["total"]
        
        return results
    
    def evaluate_test_case(self, agent, test_case: TestCase) -> Dict:
        """Evaluate single test case"""
        
        # Write test code to file
        test_file = f"tests/fixtures/{test_case.name.replace(' ', '_')}.py"
        with open(test_file, 'w') as f:
            f.write(test_case.input_code)
        
        # Run agent
        result = agent.process_request(
            f"Analyze and fix issues in {test_file}",
            test_file
        )
        
        # Check if issues detected
        issues_found = result["results"][0].get("issues", [])
        detected_expected = any(
            expected in str(issues_found).lower()
            for expected in test_case.expected_issues
        )
        
        # Check if fix applied correctly
        fixed_code = result["results"][1].get("fixed_code", "")
        fix_correct = test_case.expected_fix_pattern.lower() in fixed_code.lower()
        
        return {
            "test_case": test_case.name,
            "passed": detected_expected and fix_correct,
            "issues_detected": detected_expected,
            "fix_correct": fix_correct,
            "difficulty": test_case.difficulty
        }

Performance Benchmarks

# tests/evaluation/benchmarks.py
import time
from typing import Dict

class PerformanceBenchmark:
    """Benchmark agent performance"""
    
    def __init__(self):
        self.metrics = {}
    
    def benchmark_analysis_speed(self, agent, file_sizes: List[int]) -> Dict:
        """Benchmark analysis speed"""
        
        results = {}
        
        for size in file_sizes:
            # Generate code of specific size
            code = self.generate_code(size)
            test_file = f"tests/fixtures/size_{size}.py"
            
            with open(test_file, 'w') as f:
                f.write(code)
            
            # Time analysis
            start = time.time()
            agent.process_request(f"Analyze {test_file}", test_file)
            duration = time.time() - start
            
            results[size] = {
                "duration": duration,
                "lines_per_second": size / duration
            }
        
        return results
    
    def benchmark_fix_quality(self, agent, test_cases: List[TestCase]) -> Dict:
        """Benchmark fix quality"""
        
        metrics = {
            "fixes_attempted": 0,
            "fixes_successful": 0,
            "fixes_optimal": 0,
            "avg_fix_time": []
        }
        
        for test_case in test_cases:
            start = time.time()
            
            # Generate fix
            result = agent.process_request(
                f"Fix issues in {test_case.name}",
                test_case.name
            )
            
            duration = time.time() - start
            metrics["avg_fix_time"].append(duration)
            metrics["fixes_attempted"] += 1
            
            if result["results"][-1]["success"]:
                metrics["fixes_successful"] += 1
                
                # Check if optimal
                if self.is_optimal_fix(result["results"][-1]["fixed_code"]):
                    metrics["fixes_optimal"] += 1
        
        return metrics
    
    def generate_code(self, lines: int) -> str:
        """Generate code of specific size"""
        return "\n".join([f"# Line {i}" for i in range(lines)])
    
    def is_optimal_fix(self, code: str) -> bool:
        """Check if fix is optimal"""
        # Simplified check
        return "try" in code or "if" in code

Real-World Testing

Beta Testing Strategy

class BetaTester:
    """Coordinate beta testing"""
    
    def __init__(self):
        self.testers = []
        self.feedback = []
    
    def run_beta_test(self, agent, duration_days: int = 7) -> Dict:
        """Run beta test program"""
        
        print(f"Starting {duration_days}-day beta test...")
        
        # Collect usage data
        usage_data = self.collect_usage_data(agent, duration_days)
        
        # Collect feedback
        feedback = self.collect_feedback()
        
        # Analyze results
        analysis = self.analyze_beta_results(usage_data, feedback)
        
        return analysis
    
    def collect_usage_data(self, agent, days: int) -> Dict:
        """Collect usage metrics"""
        
        return {
            "total_requests": 0,
            "successful_requests": 0,
            "avg_response_time": 0,
            "most_common_tasks": [],
            "error_rate": 0
        }
    
    def collect_feedback(self) -> List[Dict]:
        """Collect user feedback"""
        
        return [
            {
                "user": "tester1",
                "rating": 4,
                "comments": "Works well for simple bugs",
                "issues": ["Slow on large files"]
            }
        ]
    
    def analyze_beta_results(self, usage: Dict, feedback: List[Dict]) -> Dict:
        """Analyze beta test results"""
        
        avg_rating = sum(f["rating"] for f in feedback) / len(feedback)
        
        return {
            "usage_stats": usage,
            "avg_rating": avg_rating,
            "key_issues": self.extract_key_issues(feedback),
            "recommendations": self.generate_recommendations(usage, feedback)
        }
    
    def extract_key_issues(self, feedback: List[Dict]) -> List[str]:
        """Extract common issues"""
        
        all_issues = []
        for f in feedback:
            all_issues.extend(f.get("issues", []))
        
        # Count frequency
        from collections import Counter
        return [issue for issue, count in Counter(all_issues).most_common(5)]
    
    def generate_recommendations(self, usage: Dict, feedback: List[Dict]) -> List[str]:
        """Generate improvement recommendations"""
        
        recommendations = []
        
        if usage["error_rate"] > 0.1:
            recommendations.append("Improve error handling")
        
        if usage["avg_response_time"] > 10:
            recommendations.append("Optimize performance")
        
        return recommendations

Iteration Process

Continuous Improvement Loop

class ImprovementLoop:
    """Continuous improvement system"""
    
    def __init__(self, agent):
        self.agent = agent
        self.version = 1
        self.performance_history = []
    
    def iterate(self, evaluation_results: Dict) -> Dict:
        """Improve based on evaluation"""
        
        # Identify weaknesses
        weaknesses = self.identify_weaknesses(evaluation_results)
        
        # Generate improvements
        improvements = self.generate_improvements(weaknesses)
        
        # Apply improvements
        self.apply_improvements(improvements)
        
        # Re-evaluate
        new_results = self.evaluate()
        
        # Track progress
        self.performance_history.append({
            "version": self.version,
            "results": new_results
        })
        
        self.version += 1
        
        return {
            "improvements_made": len(improvements),
            "performance_change": self.calculate_improvement(evaluation_results, new_results)
        }
    
    def identify_weaknesses(self, results: Dict) -> List[str]:
        """Identify areas needing improvement"""
        
        weaknesses = []
        
        if results["accuracy"] < 0.8:
            weaknesses.append("low_accuracy")
        
        if results.get("avg_response_time", 0) > 10:
            weaknesses.append("slow_performance")
        
        if results.get("error_rate", 0) > 0.05:
            weaknesses.append("high_error_rate")
        
        return weaknesses
    
    def generate_improvements(self, weaknesses: List[str]) -> List[Dict]:
        """Generate improvement strategies"""
        
        improvements = []
        
        for weakness in weaknesses:
            if weakness == "low_accuracy":
                improvements.append({
                    "area": "prompts",
                    "action": "Refine analysis prompts with more examples"
                })
            
            elif weakness == "slow_performance":
                improvements.append({
                    "area": "caching",
                    "action": "Add caching for repeated analyses"
                })
            
            elif weakness == "high_error_rate":
                improvements.append({
                    "area": "error_handling",
                    "action": "Add more robust error handling"
                })
        
        return improvements
    
    def apply_improvements(self, improvements: List[Dict]):
        """Apply improvements to agent"""
        
        for improvement in improvements:
            print(f"Applying: {improvement['action']}")
            # Apply improvement
            # In practice, would modify agent configuration or code
    
    def evaluate(self) -> Dict:
        """Run evaluation"""
        suite = EvaluationSuite()
        return suite.run_evaluation(self.agent)
    
    def calculate_improvement(self, old: Dict, new: Dict) -> float:
        """Calculate improvement percentage"""
        
        old_acc = old.get("accuracy", 0)
        new_acc = new.get("accuracy", 0)
        
        return ((new_acc - old_acc) / old_acc * 100) if old_acc > 0 else 0

Production Deployment

Deployment Checklist

All tests passing
Performance benchmarks met
Security audit completed
Documentation updated
Monitoring configured
Rollback plan ready
User training completed
Feedback system active

Monitoring Setup

# src/monitoring/metrics.py
from prometheus_client import Counter, Histogram, Gauge
import time

# Define metrics
requests_total = Counter('agent_requests_total', 'Total requests', ['task_type'])
request_duration = Histogram('agent_request_duration_seconds', 'Request duration')
active_tasks = Gauge('agent_active_tasks', 'Active tasks')
errors_total = Counter('agent_errors_total', 'Total errors', ['error_type'])

class MonitoredAgent:
    """Agent with monitoring"""
    
    def __init__(self, agent):
        self.agent = agent
    
    def process_request(self, request: str, target: str) -> Dict:
        """Process with monitoring"""
        
        active_tasks.inc()
        start = time.time()
        
        try:
            result = self.agent.process_request(request, target)
            
            # Record metrics
            requests_total.labels(task_type=result.get("task_type", "unknown")).inc()
            request_duration.observe(time.time() - start)
            
            return result
            
        except Exception as e:
            errors_total.labels(error_type=type(e).__name__).inc()
            raise
        
        finally:
            active_tasks.dec()

Logging Strategy

# src/monitoring/logging_config.py
import logging
import json

class StructuredLogger:
    """Structured logging for agent"""
    
    def __init__(self):
        self.logger = logging.getLogger("se_agent")
        self.logger.setLevel(logging.INFO)
        
        handler = logging.FileHandler("agent.log")
        handler.setFormatter(logging.Formatter('%(message)s'))
        self.logger.addHandler(handler)
    
    def log_request(self, request: str, target: str):
        """Log incoming request"""
        self.logger.info(json.dumps({
            "event": "request",
            "request": request,
            "target": target,
            "timestamp": time.time()
        }))
    
    def log_result(self, result: Dict):
        """Log result"""
        self.logger.info(json.dumps({
            "event": "result",
            "success": result.get("success"),
            "timestamp": time.time()
        }))
    
    def log_error(self, error: Exception):
        """Log error"""
        self.logger.error(json.dumps({
            "event": "error",
            "error_type": type(error).__name__,
            "error_message": str(error),
            "timestamp": time.time()
        }))

User Feedback Collection

Feedback System

# src/feedback/collector.py
from typing import Dict, Optional
import sqlite3

class FeedbackCollector:
    """Collect and analyze user feedback"""
    
    def __init__(self, db_path: str = "data/feedback.db"):
        self.db_path = db_path
        self.init_db()
    
    def init_db(self):
        """Initialize feedback database"""
        
        conn = sqlite3.connect(self.db_path)
        cursor = conn.cursor()
        
        cursor.execute('''
            CREATE TABLE IF NOT EXISTS feedback (
                id INTEGER PRIMARY KEY,
                task_id TEXT,
                rating INTEGER,
                comments TEXT,
                accepted BOOLEAN,
                timestamp REAL
            )
        ''')
        
        conn.commit()
        conn.close()
    
    def collect(self, task_id: str, rating: int, comments: str, accepted: bool):
        """Store feedback"""
        
        conn = sqlite3.connect(self.db_path)
        cursor = conn.cursor()
        
        cursor.execute('''
            INSERT INTO feedback (task_id, rating, comments, accepted, timestamp)
            VALUES (?, ?, ?, ?, ?)
        ''', (task_id, rating, comments, accepted, time.time()))
        
        conn.commit()
        conn.close()
    
    def analyze_feedback(self) -> Dict:
        """Analyze collected feedback"""
        
        conn = sqlite3.connect(self.db_path)
        cursor = conn.cursor()
        
        # Get statistics
        cursor.execute('SELECT AVG(rating), COUNT(*) FROM feedback')
        avg_rating, total = cursor.fetchone()
        
        cursor.execute('SELECT COUNT(*) FROM feedback WHERE accepted = 1')
        accepted = cursor.fetchone()[0]
        
        conn.close()
        
        return {
            "avg_rating": avg_rating,
            "total_feedback": total,
            "acceptance_rate": accepted / total if total > 0 else 0
        }
    
    def get_improvement_suggestions(self) -> List[str]:
        """Extract improvement suggestions from feedback"""
        
        conn = sqlite3.connect(self.db_path)
        cursor = conn.cursor()
        
        # Get low-rated feedback
        cursor.execute('SELECT comments FROM feedback WHERE rating < 3')
        low_rated = cursor.fetchall()
        
        conn.close()
        
        # Extract common themes
        suggestions = []
        for (comment,) in low_rated:
            if comment:
                suggestions.append(comment)
        
        return suggestions

A/B Testing

Comparing Agent Versions

class ABTester:
    """A/B test different agent versions"""
    
    def __init__(self, agent_a, agent_b):
        self.agent_a = agent_a
        self.agent_b = agent_b
        self.results_a = []
        self.results_b = []
    
    def run_ab_test(self, test_cases: List[TestCase]) -> Dict:
        """Run A/B test"""
        
        import random
        
        for test_case in test_cases:
            # Randomly assign to A or B
            if random.random() < 0.5:
                result = self.test_agent(self.agent_a, test_case)
                self.results_a.append(result)
            else:
                result = self.test_agent(self.agent_b, test_case)
                self.results_b.append(result)
        
        # Compare results
        return self.compare_results()
    
    def test_agent(self, agent, test_case: TestCase) -> Dict:
        """Test single agent"""
        
        start = time.time()
        result = agent.process_request(test_case.name, test_case.name)
        duration = time.time() - start
        
        return {
            "success": result.get("success", False),
            "duration": duration
        }
    
    def compare_results(self) -> Dict:
        """Compare A vs B"""
        
        a_success = sum(1 for r in self.results_a if r["success"]) / len(self.results_a)
        b_success = sum(1 for r in self.results_b if r["success"]) / len(self.results_b)
        
        a_speed = sum(r["duration"] for r in self.results_a) / len(self.results_a)
        b_speed = sum(r["duration"] for r in self.results_b) / len(self.results_b)
        
        return {
            "agent_a": {"success_rate": a_success, "avg_duration": a_speed},
            "agent_b": {"success_rate": b_success, "avg_duration": b_speed},
            "winner": "A" if a_success > b_success else "B"
        }

Iteration Examples

Iteration 1: Improve Accuracy

Problem: Agent missing 30% of bugs

Analysis:

# Analyze false negatives
false_negatives = [
    "Off-by-one errors",
    "Null pointer issues",
    "Type mismatches"
]

Solution:

# Enhanced analysis prompt
enhanced_prompt = """Analyze code for:
1. Logic errors (off-by-one, boundary conditions)
2. Null/None handling
3. Type safety
4. Resource leaks
5. Concurrency issues

Be thorough and check edge cases."""

# Update analyzer
analyzer.system_prompt = enhanced_prompt

Result: Accuracy improved from 70% → 85%

Iteration 2: Optimize Performance

Problem: Analysis takes 15s per file (target: <5s)

Analysis:

# Profile performance
import cProfile

profiler = cProfile.Profile()
profiler.enable()
agent.process_request("Analyze file.py", "file.py")
profiler.disable()
profiler.print_stats(sort='cumtime')

Solution:

# Add caching
class CachedAnalyzer:
    def __init__(self):
        self.cache = {}
    
    def analyze(self, file_path: str) -> Dict:
        # Check cache
        file_hash = self.hash_file(file_path)
        
        if file_hash in self.cache:
            return self.cache[file_hash]
        
        # Analyze
        result = self.do_analysis(file_path)
        
        # Cache result
        self.cache[file_hash] = result
        
        return result

Result: Analysis time reduced to 3s per file

Iteration 3: Reduce False Positives

Problem: 40% of reported issues are false positives

Analysis:

# Analyze false positives
fp_analysis = {
    "style_issues_as_bugs": 15,
    "context_misunderstanding": 12,
    "overly_strict_checks": 8
}

Solution:

# Add confidence scoring
class ConfidenceScorer:
    def score_issue(self, issue: Dict) -> float:
        """Score issue confidence"""
        
        score = 0.5  # Base
        
        # Increase for multiple sources
        if issue["source"] == "static" and issue.get("llm_confirmed"):
            score += 0.3
        
        # Increase for severity
        if issue["severity"] == "critical":
            score += 0.2
        
        return min(score, 1.0)

# Filter low-confidence issues
filtered_issues = [i for i in issues if scorer.score_issue(i) > 0.6]

Result: False positive rate reduced from 40% → 15%

Production Metrics

Key Metrics to Track

class ProductionMetrics:
    """Track production metrics"""
    
    def __init__(self):
        self.metrics = {
            "requests_per_day": 0,
            "success_rate": 0,
            "avg_response_time": 0,
            "user_satisfaction": 0,
            "bugs_fixed": 0,
            "tests_generated": 0,
            "code_quality_improvement": 0
        }
    
    def daily_report(self) -> Dict:
        """Generate daily metrics report"""
        
        return {
            "date": time.strftime("%Y-%m-%d"),
            "metrics": self.metrics,
            "alerts": self.check_alerts()
        }
    
    def check_alerts(self) -> List[str]:
        """Check for metric alerts"""
        
        alerts = []
        
        if self.metrics["success_rate"] < 0.9:
            alerts.append("Success rate below threshold")
        
        if self.metrics["avg_response_time"] > 10:
            alerts.append("Response time above threshold")
        
        return alerts

Final Evaluation

Comprehensive Assessment

def final_evaluation(agent) -> Dict:
    """Comprehensive final evaluation"""
    
    # Run test suite
    suite = EvaluationSuite()
    test_results = suite.run_evaluation(agent)
    
    # Run benchmarks
    benchmark = PerformanceBenchmark()
    perf_results = benchmark.benchmark_analysis_speed(agent, [100, 500, 1000])
    
    # Analyze feedback
    feedback = FeedbackCollector()
    feedback_analysis = feedback.analyze_feedback()
    
    # Generate report
    report = {
        "test_results": test_results,
        "performance": perf_results,
        "user_feedback": feedback_analysis,
        "overall_score": calculate_overall_score(test_results, perf_results, feedback_analysis)
    }
    
    return report

def calculate_overall_score(tests: Dict, perf: Dict, feedback: Dict) -> float:
    """Calculate overall score"""
    
    # Weighted average
    test_score = tests["accuracy"] * 0.4
    perf_score = (1.0 if perf[100]["duration"] < 5 else 0.5) * 0.3
    feedback_score = feedback["acceptance_rate"] * 0.3
    
    return test_score + perf_score + feedback_score

class ReviewerAgent:
    def review_pr(self, diff: str) -> Dict:
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[{
                "role": "user",
                "content": f"Review this code change:\n{diff}\n\nProvide: issues, suggestions, approval"
            }]
        )
        return {"review": response.choices[0].message.content}

Exercise 2: Implement Learning System (Hard)

Task: Make the agent learn from user corrections.

Click to see solution

class LearningAgent:
    def __init__(self):
        self.corrections = []
    
    def learn_from_correction(self, original: str, corrected: str):
        self.corrections.append({"original": original, "corrected": corrected})
        
        # Use corrections as few-shot examples
        if len(self.corrections) > 5:
            self.update_prompts()

✅ Chapter 10 Summary

You’ve completed the capstone project:

Designed a multi-agent software engineering system

Implemented specialized agents (analyzer, fixer, tester)

Integrated all concepts from previous chapters

Evaluated with comprehensive test suites

Deployed with monitoring and feedback loops

This capstone demonstrates how to combine planning, memory, tools, safety, and learning into a production-ready autonomous system.

What You’ve Learned

Throughout this course, you’ve mastered:

Foundations: Agent architecture and LLM fundamentals
Building: ReAct patterns and tool integration
Advanced Patterns: Planning, memory, multi-agent systems
Tools: Code execution, data access, web interaction
Production: Reliability, testing, monitoring
Specialization: Coding, research, automation agents
Advanced Topics: Learning, multimodal, frameworks
Enterprise: Architecture, security, cost optimization
Research: Frontier capabilities, emerging paradigms
Capstone: Complete production-ready agent

Next Steps

Deploy your agent: Put it into production
Contribute: Share your implementation
Research: Explore open problems
Build more: Create specialized agents
Teach: Share your knowledge

Thank you for completing the Agentic Guide to AI Agents course!

Tools & Libraries

Core Libraries

LLM APIs

OpenAI

pip install openai

from openai import OpenAI
client = OpenAI(api_key="your-key")

Models: GPT-4, GPT-3.5-turbo
Function calling support
Streaming responses
Documentation

Anthropic Claude

pip install anthropic

import anthropic
client = anthropic.Anthropic(api_key="your-key")

Models: Claude 3 (Opus, Sonnet, Haiku)
Long context windows (200K tokens)
Documentation

AWS Bedrock

pip install boto3

import boto3
bedrock = boto3.client('bedrock-runtime', region_name='us-east-1')

Multiple model providers
Enterprise features
Documentation

Agent Frameworks

LangChain

pip install langchain langchain-openai

Chains, agents, tools
Memory management
Documentation

LangGraph

pip install langgraph

Graph-based workflows
State management
Documentation

AutoGPT

git clone https://github.com/Significant-Gravitas/AutoGPT

Autonomous task execution
Plugin system

CrewAI

pip install crewai

Multi-agent orchestration
Role-based agents

Vector Databases

ChromaDB

pip install chromadb

import chromadb
client = chromadb.Client()
collection = client.create_collection("docs")

Embedded database
Simple API

Pinecone

pip install pinecone-client

Managed service
High performance
Scalable

Weaviate

pip install weaviate-client

Open source
Hybrid search
GraphQL API

Code Analysis

AST Tools

pip install ast-grep-py

Python: Built-in ast module
Multi-language: tree-sitter

Linters

pip install pylint ruff mypy

pylint: Comprehensive checking
ruff: Fast linting
mypy: Type checking

Formatters

pip install black isort

black: Code formatting
isort: Import sorting

Testing

pytest

pip install pytest pytest-asyncio pytest-cov

Unit testing
Async support
Coverage reports

unittest

Built-in Python testing
Standard library

Monitoring

Prometheus

pip install prometheus-client

Metrics collection
Time series data

OpenTelemetry

pip install opentelemetry-api opentelemetry-sdk

Distributed tracing
Metrics and logs

Utilities

Docker SDK

pip install docker

Container management
Safe code execution

GitPython

pip install gitpython

Git operations
Repository management

Requests

pip install requests httpx

HTTP requests
API integration

Development Tools

IDEs & Editors

VS Code: Python, Jupyter extensions
PyCharm: Professional Python IDE
Cursor: AI-powered editor
Jupyter: Interactive notebooks

Debugging

pdb: Python debugger
ipdb: Enhanced debugger
pytest-pdb: Test debugging

Documentation

Sphinx: Python documentation
MkDocs: Markdown documentation
mdBook: Rust-based book tool

Deployment Tools

Containerization

Docker: Container platform
Docker Compose: Multi-container apps

Orchestration

Kubernetes: Container orchestration
AWS ECS: Managed containers
AWS Lambda: Serverless functions

CI/CD

GitHub Actions: Automated workflows
GitLab CI: Integrated CI/CD
AWS CodePipeline: AWS-native CI/CD

Quick Start Template

# requirements.txt
openai==1.12.0
langchain==0.1.0
chromadb==0.4.22
fastapi==0.109.0
uvicorn==0.27.0
pytest==8.0.0

# agent.py
from openai import OpenAI

class SimpleAgent:
    def __init__(self):
        self.client = OpenAI()
    
    def run(self, task: str) -> str:
        response = self.client.chat.completions.create(
            model="gpt-4",
            messages=[{"role": "user", "content": task}]
        )
        return response.choices[0].message.content

agent = SimpleAgent()
result = agent.run("Hello!")
print(result)

Resources

Research Papers

Foundational Papers

ReAct: Synergizing Reasoning and Acting in Language Models

Authors: Yao et al. (2022)
Paper
Key contribution: Reasoning + Acting pattern

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

Authors: Wei et al. (2022)
Paper
Key contribution: Step-by-step reasoning

Toolformer: Language Models Can Teach Themselves to Use Tools

Authors: Schick et al. (2023)
Paper
Key contribution: Self-taught tool use

Generative Agents: Interactive Simulacra of Human Behavior

Authors: Park et al. (2023)
Paper
Key contribution: Memory and planning

Recent Advances

GPT-4 Technical Report

OpenAI (2023)
Paper

Constitutional AI: Harmlessness from AI Feedback

Anthropic (2022)
Paper

Tree of Thoughts: Deliberate Problem Solving with Large Language Models

Yao et al. (2023)
Paper

Books

Artificial Intelligence: A Modern Approach

Authors: Russell & Norvig
Classic AI textbook
Agent architectures

Deep Learning

Authors: Goodfellow, Bengio, Courville
Neural network foundations
Free online

Reinforcement Learning: An Introduction

Authors: Sutton & Barto
RL fundamentals
Free online

Online Courses

DeepLearning.AI

LangChain courses
AI agent specializations
Website

Fast.ai

Practical deep learning
Free courses
Website

Stanford CS224N

NLP with Deep Learning
Course page

Blogs & Tutorials

Lilian Weng’s Blog

lilianweng.github.io
Excellent agent overviews
Research summaries

Anthropic Research

anthropic.com/research
Constitutional AI
Safety research

OpenAI Blog

openai.com/blog
Model releases
Research updates

Hugging Face Blog

huggingface.co/blog
Model tutorials
Community projects

Communities

Discord Servers

LangChain Discord
OpenAI Developer Community
AI Agent Builders

Reddit

r/MachineLearning
r/LanguageTechnology
r/artificial

GitHub

Awesome-LLM repositories
Agent implementations
Open source projects

Conferences

NeurIPS - Neural Information Processing Systems

December annually
Top ML conference

ICML - International Conference on Machine Learning

July annually
Core ML research

ICLR - International Conference on Learning Representations

May annually
Deep learning focus

ACL - Association for Computational Linguistics

July annually
NLP research

Datasets & Benchmarks

HumanEval

Code generation benchmark
GitHub

MMLU - Massive Multitask Language Understanding

Knowledge benchmark
57 subjects

BIG-bench

Diverse task benchmark
GitHub

AgentBench

Agent capability benchmark
Multi-environment testing

Tools & Platforms

Weights & Biases

Experiment tracking
wandb.ai

LangSmith

LangChain debugging
Trace visualization

Helicone

LLM observability
Cost tracking

PromptLayer

Prompt management
Version control

Code Repositories

LangChain

github.com/langchain-ai/langchain

AutoGPT

github.com/Significant-Gravitas/AutoGPT

BabyAGI

github.com/yoheinakajima/babyagi

AgentGPT

github.com/reworkd/AgentGPT

Stay Updated

Newsletters

The Batch (DeepLearning.AI)
Import AI
TLDR AI

Twitter/X Accounts

@AndrewYNg
@karpathy
@ylecun
@goodfellow_ian

YouTube Channels

Andrej Karpathy
Two Minute Papers
Yannic Kilcher

Practice Platforms

Kaggle

Competitions
Datasets
Notebooks

HuggingFace Spaces

Deploy demos
Share models

Replicate

Run models
API access

Glossary

A

Agent - An autonomous system that perceives its environment and takes actions to achieve goals.

Agentic Framework - A software framework designed for building AI agents (e.g., LangChain, AutoGPT).

API (Application Programming Interface) - Interface for software components to communicate.

AST (Abstract Syntax Tree) - Tree representation of code structure.

B

Backoff - Strategy for retrying failed operations with increasing delays.

Benchmark - Standardized test for measuring performance.

Beam Search - Search algorithm that explores multiple paths simultaneously.

C

Chain-of-Thought (CoT) - Prompting technique that encourages step-by-step reasoning.

Checkpoint - Saved state of a model or agent for recovery.

Context Window - Maximum amount of text an LLM can process at once.

Constitutional AI - Approach to align AI behavior with principles.

D

Deterministic - Producing the same output given the same input.

Distributed Tracing - Tracking requests across multiple services.

Docker - Platform for containerizing applications.

E

Embedding - Vector representation of text or data.

Episodic Memory - Memory of specific past events or experiences.

Evaluation Metric - Quantitative measure of performance.

F

Few-Shot Learning - Learning from a small number of examples.

Fine-Tuning - Training a pre-trained model on specific data.

Function Calling - LLM capability to invoke external functions.

G

Generalization - Ability to perform well on unseen data.

Guardrails - Safety mechanisms to prevent harmful behavior.

GPU (Graphics Processing Unit) - Hardware for parallel computation.

H

Hallucination - When LLMs generate false or nonsensical information.

Human-in-the-Loop (HITL) - System requiring human approval for decisions.

Hyperparameter - Configuration parameter for model training.

Long-Horizon Planning - Planning over extended time periods.

M

Memory System - Component for storing and retrieving information.

Meta-Learning - Learning how to learn.

Microservices - Architecture pattern with independent services.

Multimodal - Processing multiple types of data (text, images, audio).

Production - Live environment serving real users.

R

RAG (Retrieval-Augmented Generation) - Combining retrieval with generation.

ReAct - Pattern combining reasoning and acting.

Reinforcement Learning (RL) - Learning through rewards and penalties.

RLHF (Reinforcement Learning from Human Feedback) - Training with human preferences.

S

Sandbox - Isolated environment for safe code execution.

Semantic Memory - Memory of facts and knowledge.

Semantic Search - Search based on meaning, not keywords.

Self-Improvement - Agent’s ability to improve its own capabilities.

Streaming - Sending responses incrementally as generated.

T

Temperature - Parameter controlling randomness in LLM outputs (0=deterministic, 1=creative).

Token - Unit of text processed by LLMs (roughly 0.75 words).

Tool - External function or API an agent can use.

Tree of Thoughts - Exploring multiple reasoning paths.

AI - Artificial Intelligence
API - Application Programming Interface
AST - Abstract Syntax Tree
CI/CD - Continuous Integration/Continuous Deployment
CoT - Chain-of-Thought
GPU - Graphics Processing Unit
HITL - Human-in-the-Loop
LLM - Large Language Model
ML - Machine Learning
NLP - Natural Language Processing
RAG - Retrieval-Augmented Generation
RL - Reinforcement Learning
RLHF - Reinforcement Learning from Human Feedback
SLA - Service Level Agreement
ToT - Tree of Thoughts
UI/UX - User Interface/User Experience

Model Parameters

Temperature - Controls randomness (0.0-2.0)

0.0-0.3: Focused, deterministic
0.4-0.7: Balanced
0.8-1.0: Creative
1.0+: Very random

Top-p (Nucleus Sampling) - Alternative to temperature (0.0-1.0)

0.1: Very focused
0.5: Balanced
0.9: Diverse

Max Tokens - Maximum length of response

Frequency Penalty - Reduces repetition (-2.0 to 2.0)

Presence Penalty - Encourages new topics (-2.0 to 2.0)

HTTP Status Codes

200 - Success
400 - Bad Request
401 - Unauthorized
429 - Rate Limited
500 - Server Error
503 - Service Unavailable

Contributing

Thank you for your interest in improving this course! Contributions are welcome and appreciated.

How to Contribute

Reporting Issues

Found an error or have a suggestion?

Check existing issues
Create a new issue with:
- Clear description
- Module and section reference
- Expected vs actual behavior
- Suggested fix (if applicable)