Agentic Guide to AI Agents
Welcome to the complete course on building AI agents from the ground up.
Course Overview
This comprehensive course takes you from foundational concepts to cutting-edge implementations in AI agent development. Whether you’re a beginner or an experienced developer, you’ll gain practical skills to build, deploy, and scale intelligent agents.
What You’ll Learn
- Core concepts of AI agents and their architecture
- Building agents with reasoning and tool-use capabilities
- Advanced patterns including planning, memory, and multi-agent systems
- Production deployment, testing, and monitoring
- Specialized agent types for coding, research, and automation
- Enterprise-scale architecture and security considerations
- Latest research and emerging paradigms
Prerequisites
- Basic Python programming
- Understanding of APIs and HTTP
- Familiarity with command line
- Basic ML/AI concepts (helpful but not required)
Learning Path
- Beginner: Chapters 1-2 (2-3 weeks)
- Intermediate: Chapters 3-5 (4-6 weeks)
- Advanced: Chapters 6-9 (6-8 weeks)
- Expert: Module 10 + Research (ongoing)
Estimated Time
Total: 12-16 weeks for complete mastery with hands-on projects throughout.
Let’s begin your journey to mastering AI agents!
Prerequisites
Required Knowledge
Programming Fundamentals
- Python proficiency: Functions, classes, decorators, async/await
- Data structures: Lists, dicts, sets, queues
- Error handling: Try/except, custom exceptions
- File I/O: Reading/writing files
Basic Concepts
- APIs: REST APIs, HTTP methods, JSON
- Command line: Basic bash/terminal commands
- Git: Version control basics
- Environment variables: Configuration management
Recommended (Not Required)
- Machine learning basics
- Natural language processing concepts
- Docker/containerization
- Cloud platforms (AWS, Azure, GCP)
Technical Requirements
Software
- Python 3.9+: Download
- pip: Package manager (comes with Python)
- Git: Download
- Code editor: VS Code, PyCharm, or similar
- Terminal: Command line access
Accounts
- OpenAI API key: Get key
- Or Anthropic, AWS Bedrock, etc.
- GitHub account: For version control
- Optional: Cloud provider account (AWS, GCP, Azure)
Hardware
- Minimum: 8GB RAM, modern CPU
- Recommended: 16GB RAM, GPU for local models
- Internet: Stable connection for API calls
Setup Instructions
1. Install Python
# Check Python version
python --version # Should be 3.9+
# Create virtual environment
python -m venv venv
# Activate (macOS/Linux)
source venv/bin/activate
# Activate (Windows)
venv\Scripts\activate
2. Install Core Libraries
pip install openai langchain chromadb fastapi uvicorn pytest
3. Configure API Keys
# Create .env file
echo "OPENAI_API_KEY=your-key-here" > .env
# Or export directly
export OPENAI_API_KEY="your-key-here"
4. Verify Setup
# test_setup.py
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Hello!"}]
)
print("✓ Setup successful!")
print(response.choices[0].message.content)
Time Commitment
- Total course: 40-60 hours
- Per chapter: 4-6 hours
- Capstone project: 10-15 hours
Recommended pace: 2-3 chapters per week
Learning Path
Beginner Track (Start Here)
- Module 1: Foundations
- Module 2: Building Your First Agent
- Module 4: Agent Tools & Capabilities
- Module 5: Production-Ready Agents
Intermediate Track
- Module 3: Advanced Agent Patterns
- Module 6: Specialized Agent Types
- Module 7: Advanced Topics
Advanced Track
- Module 8: Enterprise & Scale
- Module 9: Cutting-Edge Research
- Module 10: Capstone Project
Getting Help
- GitHub Issues: Report errors or ask questions
- Discussions: Share projects and get feedback
- Community: Join Discord/Slack communities (see Resources)
Ready to Start?
If you meet the prerequisites, you’re ready to begin! Start with the Introduction and then dive into Module 1.
About This Course
Author
Kyaw Mong is a software engineer and AI practitioner with extensive experience building production AI systems. This course distills years of hands-on experience into a comprehensive learning path for aspiring agent developers.
Course Philosophy
This course is built on three principles:
1. Learn by Building Every concept is accompanied by working code examples. You’ll build real agents, not just read about them.
2. Production-First We don’t just teach toy examples. You’ll learn reliability, testing, monitoring, and deployment—everything needed for production systems.
3. Comprehensive Coverage From foundations to frontier research, this course covers the full spectrum of agent development in 21,000+ lines of detailed content.
What Makes This Course Different
- Complete working code: Every example runs and can be deployed
- Real-world focus: Patterns used in production systems
- Cutting-edge content: Latest research and techniques
- Hands-on capstone: Build a complete autonomous agent
- Free and open source: Available to everyone
Course Structure
The course follows a carefully designed progression:
Foundations (Chapters 1-2): Core concepts and first agent
Intermediate (Chapters 3-5): Advanced patterns and production readiness
Advanced (Chapters 6-8): Specialized agents and enterprise scale
Expert (Chapters 9-10): Research frontiers and capstone project
Acknowledgments
This course builds on the incredible work of the AI research community. Special thanks to:
- OpenAI, Anthropic, and other AI labs for advancing the field
- LangChain, AutoGPT, and framework creators
- The open source community
- Researchers publishing papers and sharing knowledge
Version History
v1.0 (February 2026)
- Initial release
- 10 complete chapters
- Autonomous Software Engineering Agent capstone
- 21,000+ lines of content
Contact & Feedback
- GitHub: ekyawthan/ai-agents-course
- Issues: Report errors or suggest improvements
- Discussions: Share your projects and ask questions
License
This course is released under the MIT License. You’re free to use, modify, and share the content with attribution.
Ready to start learning? Head to Prerequisites to get set up!
Frequently Asked Questions
Getting Started
Which LLM should I use?
For learning: Start with OpenAI’s GPT-3.5-turbo
- Affordable ($0.50-2 per million tokens)
- Fast responses
- Good function calling support
For production: Consider:
- GPT-4: Best reasoning, higher cost
- Claude 3: Long context (200K tokens), excellent for complex tasks
- AWS Bedrock: Enterprise features, multiple models
- Open source (Llama, Mistral): Self-hosted, no API costs
How much does it cost to run agents?
Development (100 requests/day):
- GPT-3.5: ~$5-10/month
- GPT-4: ~$30-50/month
Production (10K requests/day):
- GPT-3.5: ~$500-1000/month
- GPT-4: ~$3000-5000/month
Cost optimization:
- Use caching (50-70% reduction)
- Smaller models for simple tasks
- Batch requests when possible
Do I need a GPU?
No for most agent development:
- API-based LLMs run in the cloud
- Your code just makes HTTP requests
Yes if you want to:
- Run local models (Llama, Mistral)
- Fine-tune models
- Process large batches offline
Can I use this commercially?
Yes, but check:
- LLM provider terms (OpenAI, Anthropic allow commercial use)
- Open source licenses for frameworks
- Data privacy regulations (GDPR, etc.)
- Your specific use case compliance needs
Technical Questions
How do I handle rate limits?
from tenacity import retry, wait_exponential, stop_after_attempt
@retry(
wait=wait_exponential(multiplier=1, min=4, max=60),
stop=stop_after_attempt(5)
)
def call_llm(prompt):
return client.chat.completions.create(...)
How do I reduce latency?
- Streaming: Stream responses as they generate
- Caching: Cache repeated queries
- Smaller models: Use GPT-3.5 for simple tasks
- Parallel calls: Run independent calls concurrently
- Prompt optimization: Shorter prompts = faster responses
How do I prevent hallucinations?
- Require tool use: Force agents to use tools, not memory
- Validation: Verify outputs before using them
- Lower temperature: Use 0.2-0.3 for factual tasks
- Structured outputs: Use JSON mode or function calling
- Retrieval: Use RAG to ground responses in facts
How do I debug agent failures?
- Log everything: All thoughts, actions, observations
- Trace execution: Use tools like LangSmith
- Test incrementally: Start simple, add complexity
- Validate tools: Test tools independently
- Check prompts: Ensure clear instructions
Architecture Questions
Single agent vs multi-agent?
Single agent when:
- Task is focused and well-defined
- Simplicity is important
- Low latency is critical
Multi-agent when:
- Task requires diverse expertise
- Parallel processing helps
- Checks and balances needed
- Scaling beyond single agent
How do I handle long-running tasks?
- Async processing: Use background jobs
- Checkpointing: Save state periodically
- Progress updates: Stream status to user
- Timeouts: Set reasonable limits
- Resumability: Allow restart from checkpoint
How do I scale to production?
- Horizontal scaling: Multiple agent instances
- Load balancing: Distribute requests
- Caching: Redis for responses
- Queue systems: RabbitMQ, SQS for async tasks
- Monitoring: Track performance and errors
Safety & Security
How do I make agents safe?
- Sandboxing: Isolate code execution (Docker)
- Validation: Check all inputs and outputs
- Rate limiting: Prevent abuse
- Human approval: For critical actions
- Audit logging: Track all actions
- Guardrails: Block harmful requests
What about prompt injection?
Defense strategies:
- Input sanitization: Remove suspicious patterns
- Separate contexts: User input vs system instructions
- Output validation: Check for unexpected behavior
- Monitoring: Detect anomalies
- Least privilege: Limit tool access
How do I handle sensitive data?
- Encryption: Encrypt data at rest and in transit
- Access control: Role-based permissions
- Data minimization: Only collect what’s needed
- Anonymization: Remove PII when possible
- Compliance: Follow GDPR, HIPAA, etc.
Development Questions
Which framework should I use?
LangChain: Best for rapid prototyping
- Lots of integrations
- Active community
- Good documentation
LangGraph: Best for complex workflows
- Graph-based state management
- Better control flow
- Production-ready
Custom: Best for specific needs
- Full control
- No framework overhead
- Optimized for your use case
How do I test agents?
- Unit tests: Test individual components
- Integration tests: Test agent workflows
- Evaluation sets: Benchmark on standard tasks
- A/B testing: Compare agent versions
- User testing: Real-world feedback
How long does it take to build an agent?
Simple agent (ReAct with 3-5 tools): 1-2 days Production agent (with testing, monitoring): 1-2 weeks Complex multi-agent system: 1-3 months Enterprise deployment: 3-6 months
Common Issues
“My agent gets stuck in loops”
Solutions:
- Set max_steps limit
- Add loop detection
- Improve prompts to avoid repetition
- Use planning instead of pure ReAct
“Tool calls fail frequently”
Solutions:
- Validate tool schemas
- Add retry logic with exponential backoff
- Improve tool descriptions
- Test tools independently
- Add error handling
“Agent is too slow”
Solutions:
- Use faster models (GPT-3.5 vs GPT-4)
- Enable streaming
- Cache repeated queries
- Optimize prompts (shorter = faster)
- Run tools in parallel
“Costs are too high”
Solutions:
- Cache aggressively
- Use smaller models when possible
- Optimize prompt length
- Batch requests
- Set usage limits
Learning Path
I’m a beginner programmer. Can I take this course?
You need:
- Python basics (functions, classes)
- API concepts
- Command line comfort
If you’re missing these, spend 2-4 weeks on Python fundamentals first, then return to this course.
Should I take this course or learn LangChain first?
Take this course if you want to:
- Understand agent fundamentals
- Build from scratch
- Know what’s happening under the hood
Learn LangChain first if you want to:
- Build quickly with existing tools
- Focus on applications, not internals
Ideally: Take this course, then use frameworks with deeper understanding.
How do I stay current with agent research?
- Follow researchers: Twitter/X, blogs
- Read papers: ArXiv, conferences
- Join communities: Discord, Reddit
- Experiment: Try new techniques
- Contribute: Open source projects
Still Have Questions?
- GitHub Discussions: Ask the community
- Issues: Report problems
- Contributing: Improve the course
What Are AI Agents?
Module 1: Learning Objectives
By the end of this module, you will:
- ✓ Define what AI agents are and how they differ from traditional software
- ✓ Identify different types of agents and their use cases
- ✓ Understand the perception-reasoning-action loop
- ✓ Explain how LLMs enable agentic behavior
- ✓ Recognize key components of agent architecture
Definition and Core Concepts
An AI agent is an autonomous system that perceives its environment, reasons about it, and takes actions to achieve specific goals. Unlike simple chatbots that respond to queries, agents can:
- Break down complex tasks into steps
- Use tools and external resources
- Remember context across interactions
- Adapt their approach based on feedback
- Work independently toward objectives
Think of an agent as a digital assistant that doesn’t just answer questions—it gets things done.
Agent vs. Chatbot vs. Assistant
Chatbot
- Responds to direct queries
- Stateless or minimal memory
- No tool use
- Example: Simple FAQ bot
Assistant
- Helps with tasks through conversation
- Maintains conversation context
- May access some information
- Example: Basic voice assistants
Agent
- Autonomous task execution
- Multi-step reasoning and planning
- Uses multiple tools and APIs
- Adapts strategy based on results
- Example: Research agent that searches, analyzes, and synthesizes information
Autonomy, Reasoning, and Tool Use
Autonomy
Agents operate with varying degrees of independence:
- Supervised: Requires approval for each action
- Semi-autonomous: Asks for guidance on critical decisions
- Fully autonomous: Executes complete workflows independently
Reasoning
Agents think through problems using:
- Chain-of-thought: Step-by-step logical reasoning
- Planning: Breaking goals into sub-tasks
- Reflection: Evaluating their own outputs
- Error recovery: Adapting when things go wrong
Tool Use
Modern agents extend their capabilities through tools:
- Web search and browsing
- Code execution
- Database queries
- API calls
- File operations
- Calculator and data analysis
Real-World Applications and Use Cases
Software Development
- Code generation and refactoring
- Bug detection and fixing
- Documentation writing
- Test generation
Research and Analysis
- Literature reviews
- Market research
- Competitive analysis
- Data synthesis
Business Automation
- Customer support
- Data entry and processing
- Report generation
- Workflow orchestration
Personal Productivity
- Email management
- Calendar scheduling
- Travel planning
- Information gathering
Creative Work
- Content creation
- Design assistance
- Brainstorming
- Editing and refinement
Key Characteristics of Effective Agents
- Goal-oriented: Clear objectives drive behavior
- Adaptive: Adjust approach based on feedback
- Transparent: Explain reasoning and actions
- Reliable: Handle errors gracefully
- Efficient: Minimize unnecessary steps
- Safe: Respect boundaries and constraints
The Agent Loop
At their core, agents follow a continuous cycle:
graph LR
A[Perceive] --> B[Reason]
B --> C[Act]
C --> D[Observe]
D --> A
style A fill:#dbeafe
style B fill:#fef3c7
style C fill:#d1fae5
style D fill:#e0e7ff
The Perception-Reasoning-Action Loop:
- Perceive → Observe the current state
- Reason → Decide what to do next
- Act → Execute the chosen action
- Observe → See the results
- Repeat → Continue until goal is achieved
This loop enables agents to navigate complex, multi-step tasks that would be difficult to hardcode.
What Makes Agents Possible Now?
Recent advances have made practical agents feasible:
- Large Language Models: Provide reasoning and language understanding
- Function Calling: LLMs can reliably invoke tools with structured parameters
- Context Windows: Models can maintain longer conversations and more context
- Improved Reliability: Better instruction following and fewer hallucinations
- Ecosystem: Frameworks and tools for building agents quickly
💡 Key Insight
The combination of LLMs with tool-calling capabilities is what makes modern AI agents fundamentally different from previous approaches. LLMs provide the “reasoning engine” while tools provide the “hands” to interact with the world.
Looking Ahead
As you progress through this course, you’ll learn to build agents that combine these concepts into practical, production-ready systems. We’ll start simple and gradually add sophistication.
✅ Key Takeaways
- AI agents are autonomous systems that perceive, reason, and act to achieve goals
- Agents differ from chatbots by using tools, planning, and maintaining memory
- The perception-reasoning-action loop is the core pattern
- Modern LLMs enable practical agent development through reasoning and tool use
- Agents can be simple (single-task) or complex (multi-agent systems)
In the next section, we’ll explore agent architecture and how these components fit together.
Agent Architecture Basics
The Perception-Reasoning-Action Loop
Every agent operates on a fundamental cycle that mirrors how humans approach tasks:
┌─────────────┐
│ PERCEIVE │ ← Gather information about current state
└──────┬──────┘
│
▼
┌─────────────┐
│ REASON │ ← Decide what to do next
└──────┬──────┘
│
▼
┌─────────────┐
│ ACT │ ← Execute the chosen action
└──────┬──────┘
│
└──────→ (back to PERCEIVE)
Perceive
The agent observes its environment:
- User input and instructions
- Tool outputs and results
- Current state and context
- Available resources
Reason
The agent decides on the next action:
- Analyze the current situation
- Consider available options
- Plan the next step
- Evaluate potential outcomes
Act
The agent executes its decision:
- Call a tool or function
- Generate a response
- Update internal state
- Request more information
Memory Systems
Agents need memory to maintain context and learn from experience. There are two primary types:
Short-Term Memory (Working Memory)
Holds information for the current task:
- Conversation history: Recent messages and responses
- Intermediate results: Outputs from previous steps
- Current plan: What the agent is trying to accomplish
- Execution state: Where the agent is in the workflow
Implementation: Typically stored in the LLM’s context window
Limitations:
- Fixed size (token limits)
- Cleared when task completes
- Can become cluttered
Long-Term Memory (Persistent Memory)
Retains information across sessions:
- Facts and knowledge: Learned information about the user or domain
- Past interactions: Historical conversations
- Successful strategies: What worked before
- User preferences: Personalization data
Implementation:
- Vector databases (semantic search)
- Traditional databases (structured data)
- File systems (documents, logs)
Key Operations:
- Store: Save important information
- Retrieve: Find relevant past information
- Update: Modify existing memories
- Forget: Remove outdated information
Planning and Goal-Oriented Behavior
Agents don’t just react—they plan ahead to achieve goals efficiently.
Goal Decomposition
Breaking complex goals into manageable sub-goals:
Goal: "Research and summarize recent AI papers"
├─ Sub-goal 1: Search for relevant papers
├─ Sub-goal 2: Read and extract key points
├─ Sub-goal 3: Synthesize findings
└─ Sub-goal 4: Format summary
Planning Strategies
Reactive Planning: Decide next step based on current state
- Simple and fast
- Good for straightforward tasks
- Limited lookahead
Proactive Planning: Create full plan upfront, then execute
- Better for complex tasks
- Can optimize entire workflow
- May need replanning if things change
Hybrid Planning: Plan a few steps ahead, adapt as needed
- Balances flexibility and efficiency
- Most common in practice
Plan Representation
Plans can be represented as:
- Linear sequences: Step 1 → Step 2 → Step 3
- Trees: Branching based on conditions
- Graphs: Complex dependencies between steps
- Natural language: Human-readable descriptions
Multi-Step Task Execution
Agents excel at tasks requiring multiple actions:
Execution Patterns
Sequential Execution
Step 1 → Step 2 → Step 3 → Done
Each step depends on the previous one.
Parallel Execution
Step 1a ─┐
Step 1b ─┼→ Combine → Done
Step 1c ─┘
Independent steps run simultaneously.
Conditional Execution
Step 1 → Decision
├─ If A → Step 2a → Done
└─ If B → Step 2b → Done
Path depends on intermediate results.
Iterative Execution
Step 1 → Step 2 → Check
↑ │
└─────────┘ (repeat if needed)
Loop until condition is met.
Error Handling
Robust agents handle failures gracefully:
- Detect: Recognize when something went wrong
- Diagnose: Understand the cause
- Recover: Try alternative approaches
- Escalate: Ask for help if stuck
Progress Tracking
Agents monitor their progress:
- Checkpoints: Mark completed sub-goals
- State management: Track what’s been done
- Backtracking: Undo steps if needed
- Resumption: Continue after interruption
Core Components of Agent Architecture
1. Controller (Brain)
The central decision-making component:
- Interprets user goals
- Manages the reasoning loop
- Coordinates other components
- Handles control flow
2. Memory Manager
Manages information storage and retrieval:
- Maintains conversation context
- Stores and retrieves long-term memories
- Decides what to remember/forget
- Optimizes memory usage
3. Tool Interface
Connects agent to external capabilities:
- Defines available tools
- Handles tool invocation
- Parses tool outputs
- Manages tool errors
4. Planner
Develops strategies for achieving goals:
- Decomposes complex tasks
- Generates action sequences
- Optimizes execution order
- Adapts plans based on results
5. Executor
Carries out planned actions:
- Invokes tools with correct parameters
- Monitors execution
- Collects results
- Reports status
Putting It Together
A complete agent architecture integrates these components:
User Input
↓
┌─────────────────────────────────┐
│ CONTROLLER │
│ (Orchestrates everything) │
└────┬────────────────────────┬───┘
│ │
▼ ▼
┌─────────┐ ┌─────────┐
│ MEMORY │←────────────→│ PLANNER │
└─────────┘ └────┬────┘
↑ │
│ ▼
│ ┌─────────┐
└──────────────────│EXECUTOR │
└────┬────┘
│
▼
┌─────────┐
│ TOOLS │
└─────────┘
↓
Results
Design Principles
When architecting agents, follow these principles:
- Modularity: Separate concerns into distinct components
- Observability: Make agent reasoning transparent
- Flexibility: Allow easy addition of new tools and capabilities
- Robustness: Handle errors and edge cases gracefully
- Efficiency: Minimize unnecessary steps and API calls
- Safety: Validate inputs and outputs, respect boundaries
Next Steps
Now that you understand the basic architecture, we’ll explore how LLMs power these components in the next section on LLM Fundamentals for Agents.
LLM Fundamentals for Agents
How Language Models Work
Large Language Models (LLMs) are the “brain” of modern AI agents. Understanding how they work helps you build better agents.
The Basics
LLMs are trained to predict the next token (word or word piece) given previous tokens:
Input: "The capital of France is"
Output: "Paris" (most likely next token)
This simple mechanism enables:
- Text generation
- Question answering
- Reasoning
- Code generation
- Tool use
From Prediction to Reasoning
Modern LLMs don’t just predict—they reason:
Chain-of-Thought: Breaking down problems step by step
Question: "If I have 3 apples and buy 2 more, then give away 1, how many do I have?"
LLM reasoning:
1. Start with 3 apples
2. Buy 2 more: 3 + 2 = 5
3. Give away 1: 5 - 1 = 4
Answer: 4 apples
Tool Use: Recognizing when to call external functions
User: "What's the weather in Tokyo?"
LLM: I should use the weather_api tool with location="Tokyo"
Key Capabilities for Agents
- Instruction following: Understanding and executing commands
- Context understanding: Maintaining awareness of conversation history
- Function calling: Invoking tools with correct parameters
- Error recovery: Adapting when things go wrong
- Self-reflection: Evaluating own outputs
Prompting Strategies for Agents
How you prompt an LLM dramatically affects agent performance.
System Prompts
Define the agent’s role, capabilities, and constraints:
You are a research assistant agent. Your goal is to help users
find and synthesize information from multiple sources.
Available tools:
- web_search(query): Search the internet
- read_url(url): Extract content from a webpage
- summarize(text): Create concise summaries
Always:
1. Break complex requests into steps
2. Verify information from multiple sources
3. Cite your sources
4. Ask for clarification if needed
Few-Shot Examples
Show the agent how to behave through examples:
Example 1:
User: "Find recent news about AI"
Agent: I'll search for recent AI news.
Action: web_search("AI news 2026")
Result: [search results]
Agent: Here are the top 3 recent AI developments...
Example 2:
User: "What's on that page?"
Agent: I need a URL to read a page. Could you provide the link?
ReAct Pattern
The most common prompting pattern for agents:
Thought: What do I need to do?
Action: [tool_name](parameters)
Observation: [result from tool]
Thought: What does this mean?
Action: [next tool or final answer]
Structured Outputs
Guide the LLM to produce consistent formats:
Respond in this format:
{
"reasoning": "Your thought process",
"action": "tool_name",
"parameters": {"param": "value"},
"confidence": 0.95
}
Context Windows and Token Limits
Every LLM has a maximum context window—the amount of text it can process at once.
Common Context Sizes
- GPT-4: 8K, 32K, 128K tokens
- Claude: 200K tokens
- Gemini: 1M+ tokens
What Fits in Context?
Approximate token counts:
- 1 token ≈ 4 characters
- 1 token ≈ 0.75 words
- 100 tokens ≈ 75 words
- 1,000 tokens ≈ 750 words
Context Management Strategies
1. Summarization Compress old conversation history:
[Full conversation history]
↓
[Summary of key points] + [Recent messages]
2. Sliding Window Keep only the most recent N messages:
Message 1, 2, 3, 4, 5, 6, 7, 8
└─────────┘ (keep last 4)
3. Selective Retention Keep important messages, discard routine ones:
System prompt + Key decisions + Recent context
4. External Memory Store information outside context, retrieve as needed:
Context: [Current task]
Memory DB: [All past information]
↓ (retrieve relevant)
Context: [Current task] + [Relevant memories]
Token Budget Management
For agents, allocate tokens wisely:
System prompt: 500 tokens
Tools definition: 1,000 tokens
Conversation: 5,000 tokens
Working memory: 1,500 tokens
Reserve: 1,000 tokens (for response)
─────────────────────────────
Total: 9,000 tokens (fits in 8K with buffer)
Temperature, Top-p, and Sampling Parameters
These parameters control how the LLM generates text.
Temperature
Controls randomness (0.0 to 2.0):
Low temperature (0.0 - 0.3): Deterministic, focused
Temperature: 0.1
"The capital of France is Paris" (always)
Use for: Tool calling, structured tasks, factual responses
Medium temperature (0.5 - 0.8): Balanced
Temperature: 0.7
"The capital of France is Paris, a beautiful city known for..."
Use for: General agent behavior, conversational responses
High temperature (1.0 - 2.0): Creative, random
Temperature: 1.5
"The capital of France? Ah, the magnificent Paris, where..."
Use for: Creative tasks, brainstorming, diverse outputs
Top-p (Nucleus Sampling)
Controls diversity by probability mass (0.0 to 1.0):
Low top-p (0.1 - 0.5): Conservative choices
- Considers only the most likely tokens
- More focused and consistent
High top-p (0.9 - 1.0): Diverse choices
- Considers a wider range of tokens
- More varied and creative
Typical for agents: 0.9-0.95
Top-k
Limits to top K most likely tokens:
- top-k=1: Always pick most likely (deterministic)
- top-k=10: Choose from 10 most likely
- top-k=50: More diversity
Practical Guidelines for Agents
For tool calling and structured tasks:
temperature = 0.1
top_p = 0.9
For conversational responses:
temperature = 0.7
top_p = 0.95
For creative tasks:
temperature = 1.0
top_p = 0.95
Other Important Parameters
Max Tokens
Maximum length of generated response:
- Set based on expected output length
- Leave room for tool calls and reasoning
- Typical: 500-2000 for agent responses
Stop Sequences
Tokens that halt generation:
stop_sequences = ["</tool>", "DONE", "\n\nUser:"]
Useful for controlling agent output format.
Frequency/Presence Penalty
Reduce repetition:
- Frequency penalty: Penalize tokens based on how often they appear
- Presence penalty: Penalize tokens that have appeared at all
- Typical: 0.0-0.5 for agents
Prompt Engineering Best Practices
1. Be Specific
❌ “Help me with this” ✅ “Search for recent papers on transformer architectures and summarize the key innovations”
2. Provide Context
You are helping a software engineer debug a Python application.
The user has intermediate Python knowledge.
Focus on practical solutions.
3. Use Delimiters
User input: """
{user_message}
"""
Available tools: ###
{tool_definitions}
###
4. Specify Output Format
Respond with:
1. Your reasoning
2. The action to take
3. Expected outcome
5. Handle Edge Cases
If the user's request is unclear, ask for clarification.
If a tool fails, try an alternative approach.
If you cannot complete the task, explain why.
Testing and Iteration
Evaluate Prompts Systematically
- Create test cases: Common scenarios your agent should handle
- Run experiments: Try different prompts and parameters
- Measure performance: Success rate, quality, efficiency
- Iterate: Refine based on results
Common Issues and Fixes
Issue: Agent doesn’t use tools Fix: Add explicit examples of tool usage
Issue: Agent is too verbose Fix: Lower temperature, add “be concise” instruction
Issue: Agent hallucinates Fix: Emphasize “only use provided tools”, add verification steps
Issue: Agent gets stuck in loops Fix: Add step counter, max iterations limit
Choosing the Right Model
Different models for different needs:
For Agents
GPT-4 / GPT-4 Turbo
- Excellent reasoning
- Reliable tool calling
- Good for complex tasks
Claude 3 (Opus/Sonnet)
- Long context (200K)
- Strong reasoning
- Good safety features
GPT-3.5 Turbo
- Fast and cheap
- Good for simple agents
- Lower reasoning capability
Trade-offs
- Cost vs. Capability: Stronger models cost more
- Speed vs. Quality: Faster models may be less accurate
- Context vs. Price: Longer context costs more
Next Steps
With these LLM fundamentals, you’re ready to build your first agent! In Chapter 2, we’ll implement a simple ReAct agent that puts these concepts into practice.
Simple ReAct Agent
Module 2: Learning Objectives
By the end of this module, you will:
- ✓ Implement a ReAct agent from scratch
- ✓ Integrate external tools with function calling
- ✓ Handle errors and retries gracefully
- ✓ Build a complete shopping research assistant
- ✓ Understand tool schemas and validation
Introduction to ReAct
ReAct (Reasoning + Acting) is the most popular pattern for building AI agents. It combines:
- Reasoning: Thinking through what to do
- Acting: Taking actions via tools
The agent alternates between thinking and acting until it solves the task.
The ReAct Pattern
graph TD
A[User Query] --> B[Thought: Reason]
B --> C{Need Tool?}
C -->|Yes| D[Action: Use Tool]
C -->|No| E[Answer: Respond]
D --> F[Observation: Result]
F --> B
E --> G[Done]
style B fill:#fef3c7
style D fill:#d1fae5
style F fill:#dbeafe
style E fill:#f0fdf4
The ReAct Loop:
Thought: I need to figure out what to do
Action: tool_name(parameters)
Observation: [result from the tool]
Thought: Based on this result, I should...
Action: another_tool(parameters)
Observation: [another result]
Thought: Now I have enough information
Answer: [final response to user]
Why ReAct Works
- Transparency: You can see the agent’s reasoning
- Debuggability: Easy to identify where things go wrong
- Flexibility: Works for many types of tasks
- Simplicity: Easy to implement and understand
⚠️ Important
ReAct agents can get stuck in loops or make poor decisions. Always implement max step limits and validation to prevent runaway execution.
Building Your First ReAct Agent
Let’s build a simple agent step by step.
Step 1: Define the Agent Loop
def react_agent(user_input, max_steps=10):
"""Simple ReAct agent loop"""
messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": user_input}
]
for step in range(max_steps):
# Get LLM response
response = llm.generate(messages)
# Parse response
if is_final_answer(response):
return response
# Execute action
action, params = parse_action(response)
result = execute_tool(action, params)
# Add to conversation
messages.append({"role": "assistant", "content": response})
messages.append({"role": "user", "content": f"Observation: {result}"})
return "Max steps reached"
Step 2: Create the System Prompt
SYSTEM_PROMPT = """You are a helpful AI agent that can use tools to answer questions.
Available tools:
- search(query): Search the internet for information
- calculate(expression): Evaluate mathematical expressions
- get_time(): Get the current time
Use this format:
Thought: [your reasoning about what to do]
Action: tool_name(parameters)
When you have the final answer:
Answer: [your response to the user]
Example:
User: What is 25 * 17?
Thought: I need to calculate this multiplication
Action: calculate("25 * 17")
Observation: 425
Thought: I have the result
Answer: 25 * 17 equals 425
"""
Step 3: Implement Tool Execution
def execute_tool(action, params):
"""Execute a tool and return the result"""
tools = {
"search": search_tool,
"calculate": calculate_tool,
"get_time": get_time_tool
}
if action not in tools:
return f"Error: Unknown tool '{action}'"
try:
result = tools[action](params)
return result
except Exception as e:
return f"Error: {str(e)}"
Step 4: Parse Agent Output
import re
def parse_action(response):
"""Extract action and parameters from agent response"""
# Look for Action: tool_name(params)
match = re.search(r'Action:\s*(\w+)\((.*?)\)', response)
if match:
action = match.group(1)
params = match.group(2).strip('"\'')
return action, params
return None, None
def is_final_answer(response):
"""Check if response contains final answer"""
return "Answer:" in response
Complete Working Example
import openai
import re
from datetime import datetime
# Initialize OpenAI
client = openai.OpenAI()
# System prompt
SYSTEM_PROMPT = """You are a helpful AI agent with access to tools.
Tools:
- calculate(expression): Evaluate math expressions
- get_time(): Get current time
Format:
Thought: [reasoning]
Action: tool_name(parameters)
When done:
Answer: [final response]
"""
# Tool implementations
def calculate_tool(expression):
"""Safely evaluate math expressions"""
try:
# Only allow safe operations
allowed = set('0123456789+-*/()., ')
if not all(c in allowed for c in expression):
return "Error: Invalid characters in expression"
return str(eval(expression))
except Exception as e:
return f"Error: {str(e)}"
def get_time_tool(_):
"""Get current time"""
return datetime.now().strftime("%Y-%m-%d %H:%M:%S")
# Tool registry
TOOLS = {
"calculate": calculate_tool,
"get_time": get_time_tool
}
# Parsing functions
def parse_action(text):
"""Extract action from agent response"""
match = re.search(r'Action:\s*(\w+)\((.*?)\)', text)
if match:
return match.group(1), match.group(2).strip('"\'')
return None, None
def is_final_answer(text):
"""Check if agent provided final answer"""
return "Answer:" in text
# Main agent loop
def react_agent(user_input, max_steps=10):
"""ReAct agent implementation"""
messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": user_input}
]
print(f"User: {user_input}\n")
for step in range(max_steps):
# Get LLM response
response = client.chat.completions.create(
model="gpt-4",
messages=messages,
temperature=0.1
)
agent_response = response.choices[0].message.content
print(f"Agent: {agent_response}\n")
# Check if done
if is_final_answer(agent_response):
# Extract final answer
answer = agent_response.split("Answer:")[1].strip()
return answer
# Parse and execute action
action, params = parse_action(agent_response)
if action and action in TOOLS:
result = TOOLS[action](params)
observation = f"Observation: {result}"
print(f"{observation}\n")
# Add to conversation
messages.append({"role": "assistant", "content": agent_response})
messages.append({"role": "user", "content": observation})
else:
return "Error: Could not parse action or unknown tool"
return "Max steps reached without answer"
# Test the agent
if __name__ == "__main__":
result = react_agent("What is 123 * 456?")
print(f"Final Answer: {result}")
Thought-Action-Observation Cycles
Let’s trace through an example:
User: “What’s 15% of 240?”
Cycle 1:
Thought: I need to calculate 15% of 240, which is 0.15 * 240
Action: calculate("0.15 * 240")
Observation: 36.0
Cycle 2:
Thought: I have the result
Answer: 15% of 240 is 36
Multiple Steps Example
User: “What time is it and what’s 100 + 50?”
Cycle 1:
Thought: I need to get the current time first
Action: get_time()
Observation: 2026-02-24 11:19:00
Cycle 2:
Thought: Now I need to calculate 100 + 50
Action: calculate("100 + 50")
Observation: 150
Cycle 3:
Thought: I have both pieces of information
Answer: The current time is 2026-02-24 11:19:00, and 100 + 50 equals 150
Basic Tool Calling
Tool Definition
Define tools with clear descriptions:
TOOLS = {
"search": {
"function": search_tool,
"description": "Search the internet for information",
"parameters": {
"query": "The search query string"
}
},
"calculate": {
"function": calculate_tool,
"description": "Evaluate mathematical expressions",
"parameters": {
"expression": "Math expression to evaluate (e.g., '2 + 2')"
}
}
}
Tool Implementation Best Practices
- Validate inputs: Check parameters before execution
- Handle errors: Return error messages, don’t crash
- Return strings: Consistent output format
- Be deterministic: Same input → same output
- Add timeouts: Prevent hanging operations
def search_tool(query):
"""Search tool with validation and error handling"""
# Validate
if not query or len(query) < 2:
return "Error: Query too short"
# Execute with timeout
try:
results = search_api(query, timeout=5)
return format_results(results)
except TimeoutError:
return "Error: Search timed out"
except Exception as e:
return f"Error: {str(e)}"
Error Handling and Retries
Agents need to handle failures gracefully.
Detecting Errors
def is_error(observation):
"""Check if tool execution resulted in error"""
return observation.startswith("Error:")
Retry Logic
def react_agent_with_retry(user_input, max_steps=10, max_retries=3):
"""ReAct agent with retry logic"""
messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": user_input}
]
retry_count = 0
for step in range(max_steps):
response = get_llm_response(messages)
if is_final_answer(response):
return extract_answer(response)
action, params = parse_action(response)
result = execute_tool(action, params)
# Handle errors
if is_error(result) and retry_count < max_retries:
retry_count += 1
messages.append({
"role": "user",
"content": f"{result}\nPlease try a different approach."
})
continue
retry_count = 0 # Reset on success
messages.append({"role": "assistant", "content": response})
messages.append({"role": "user", "content": f"Observation: {result}"})
return "Max steps reached"
Graceful Degradation
# Add to system prompt
"""
If a tool fails:
1. Try an alternative approach
2. If no alternative exists, explain the limitation
3. Provide the best answer you can with available information
"""
Common Pitfalls and Solutions
Pitfall 1: Infinite Loops
Problem: Agent repeats the same action Solution: Track action history, limit repetitions
action_history = []
if action in action_history[-3:]: # Same action 3 times
return "Agent stuck in loop, stopping"
action_history.append(action)
Pitfall 2: Hallucinated Tools
Problem: Agent invents non-existent tools Solution: Strict validation, clear error messages
if action not in TOOLS:
observation = f"Error: Tool '{action}' does not exist. Available tools: {list(TOOLS.keys())}"
Pitfall 3: Malformed Actions
Problem: Agent doesn’t follow format Solution: Better prompting, examples, parsing fallbacks
# Add to system prompt
"""
IMPORTANT: Always use exact format:
Action: tool_name(parameters)
Incorrect: "I'll use search tool with query X"
Correct: Action: search("query X")
"""
Pitfall 4: Premature Answers
Problem: Agent answers before using tools Solution: Emphasize tool usage in prompt
"""
You MUST use tools to answer questions. Do not guess or use prior knowledge.
Always verify information using available tools.
"""
Testing Your Agent
# Test cases
test_cases = [
("What is 50 * 20?", "1000"),
("What time is it?", None), # Time varies
("Calculate 100 / 4", "25"),
]
for question, expected in test_cases:
result = react_agent(question)
if expected:
assert expected in result, f"Failed: {question}"
print(f"✓ {question}")
💡 Pro Tip
Start with simple test cases and gradually increase complexity. Log all reasoning traces to understand how your agent makes decisions.
✅ Key Takeaways
- ReAct combines reasoning (thinking) with acting (tool use)
- The agent alternates between Thought, Action, and Observation
- Always implement max steps to prevent infinite loops
- Use structured prompts to guide agent behavior
- Validate tool calls before execution
- Common pitfalls: loops, hallucinations, premature answers
Next Steps
You now have a working ReAct agent! In the next section, we’ll explore tool integration in depth, including:
- Function calling APIs
- Complex tool schemas
- Parameter validation
- Response parsing strategies
Tool Integration
Function Calling APIs
Modern LLMs support native function calling, making tool integration more reliable than text parsing.
OpenAI Function Calling
import openai
client = openai.OpenAI()
# Define tools
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City name, e.g., 'San Francisco'"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature unit"
}
},
"required": ["location"]
}
}
}
]
# Call LLM with tools
response = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
tools=tools,
tool_choice="auto" # Let model decide when to use tools
)
# Check if model wants to call a function
message = response.choices[0].message
if message.tool_calls:
tool_call = message.tool_calls[0]
function_name = tool_call.function.name
arguments = json.loads(tool_call.function.arguments)
print(f"Calling: {function_name}({arguments})")
Anthropic Tool Use
import anthropic
client = anthropic.Anthropic()
tools = [
{
"name": "get_weather",
"description": "Get current weather for a location",
"input_schema": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City name"
}
},
"required": ["location"]
}
}
]
response = client.messages.create(
model="claude-3-opus-20240229",
max_tokens=1024,
tools=tools,
messages=[{"role": "user", "content": "Weather in Paris?"}]
)
# Check for tool use
for block in response.content:
if block.type == "tool_use":
print(f"Tool: {block.name}")
print(f"Input: {block.input}")
Benefits of Native Function Calling
- Structured output: JSON instead of text parsing
- Type safety: Parameters validated by LLM
- Reliability: Less prone to format errors
- Parallel calls: Multiple tools at once
Tool Schemas and Descriptions
Good tool definitions are critical for agent performance.
Anatomy of a Tool Schema
{
"name": "search_database", # Clear, descriptive name
"description": "Search the product database for items matching criteria. Returns up to 10 results.", # When and why to use
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "Search query (e.g., 'red shoes size 10')"
},
"category": {
"type": "string",
"enum": ["electronics", "clothing", "books"],
"description": "Product category to search within"
},
"max_price": {
"type": "number",
"description": "Maximum price in USD"
}
},
"required": ["query"] # Only query is mandatory
}
}
Writing Effective Descriptions
Bad: “Search function” Good: “Search the product database for items. Use when user asks about products, availability, or prices.”
Bad: “Gets data” Good: “Retrieve user profile data including name, email, and preferences. Use for personalization or account queries.”
Description Best Practices
- Be specific: Explain exactly what the tool does
- Include examples: Show typical parameter values
- State limitations: Mention constraints or edge cases
- Clarify use cases: When should this tool be used?
- Avoid ambiguity: Use precise language
# Good example
{
"name": "calculate_shipping",
"description": """Calculate shipping cost for an order.
Use when: User asks about shipping costs or delivery fees
Returns: Cost in USD and estimated delivery days
Limitations: Only works for US addresses
Example: calculate_shipping(weight=2.5, zip_code="94102")
""",
"parameters": {
"type": "object",
"properties": {
"weight": {
"type": "number",
"description": "Package weight in pounds (e.g., 2.5)"
},
"zip_code": {
"type": "string",
"description": "5-digit US ZIP code (e.g., '94102')"
}
},
"required": ["weight", "zip_code"]
}
}
Parameter Validation
Always validate parameters before execution.
Basic Validation
def validate_parameters(tool_name, params):
"""Validate tool parameters"""
validators = {
"search": validate_search,
"calculate": validate_calculate,
"send_email": validate_email
}
if tool_name not in validators:
return False, f"Unknown tool: {tool_name}"
return validators[tool_name](params)
def validate_search(params):
"""Validate search parameters"""
if "query" not in params:
return False, "Missing required parameter: query"
if not isinstance(params["query"], str):
return False, "Query must be a string"
if len(params["query"]) < 2:
return False, "Query too short (minimum 2 characters)"
if len(params["query"]) > 200:
return False, "Query too long (maximum 200 characters)"
return True, "Valid"
Type Validation
def validate_type(value, expected_type):
"""Validate parameter type"""
type_map = {
"string": str,
"number": (int, float),
"boolean": bool,
"array": list,
"object": dict
}
expected = type_map.get(expected_type)
if not isinstance(value, expected):
return False, f"Expected {expected_type}, got {type(value).__name__}"
return True, "Valid"
Schema-Based Validation
import jsonschema
def validate_with_schema(params, schema):
"""Validate parameters against JSON schema"""
try:
jsonschema.validate(instance=params, schema=schema)
return True, "Valid"
except jsonschema.ValidationError as e:
return False, str(e)
# Example usage
schema = {
"type": "object",
"properties": {
"email": {
"type": "string",
"format": "email"
},
"age": {
"type": "integer",
"minimum": 0,
"maximum": 150
}
},
"required": ["email"]
}
valid, message = validate_with_schema(
{"email": "user@example.com", "age": 25},
schema
)
Sanitization
Clean inputs before use:
def sanitize_string(s, max_length=1000):
"""Sanitize string input"""
# Remove null bytes
s = s.replace('\x00', '')
# Trim whitespace
s = s.strip()
# Limit length
s = s[:max_length]
return s
def sanitize_sql_input(s):
"""Prevent SQL injection"""
# Use parameterized queries instead
# This is just for demonstration
dangerous = ["'", '"', ';', '--', '/*', '*/']
for char in dangerous:
s = s.replace(char, '')
return s
Response Parsing
Handle tool outputs consistently.
Structured Responses
from dataclasses import dataclass
from typing import Optional
@dataclass
class ToolResponse:
"""Standardized tool response"""
success: bool
data: Optional[dict] = None
error: Optional[str] = None
metadata: Optional[dict] = None
def execute_tool(tool_name, params):
"""Execute tool and return structured response"""
try:
result = TOOLS[tool_name](params)
return ToolResponse(
success=True,
data=result,
metadata={"tool": tool_name, "timestamp": time.time()}
)
except Exception as e:
return ToolResponse(
success=False,
error=str(e),
metadata={"tool": tool_name}
)
Formatting for LLM
def format_tool_response(response: ToolResponse) -> str:
"""Format tool response for LLM consumption"""
if response.success:
return f"Success: {json.dumps(response.data, indent=2)}"
else:
return f"Error: {response.error}"
# Usage in agent loop
result = execute_tool("search", {"query": "AI agents"})
observation = format_tool_response(result)
messages.append({"role": "user", "content": f"Observation: {observation}"})
Handling Different Response Types
def parse_tool_output(output, expected_type="string"):
"""Parse and validate tool output"""
if expected_type == "json":
try:
return json.loads(output)
except json.JSONDecodeError:
return {"error": "Invalid JSON response"}
elif expected_type == "number":
try:
return float(output)
except ValueError:
return None
elif expected_type == "boolean":
return output.lower() in ["true", "yes", "1"]
else: # string
return str(output)
Building a Tool Registry
Organize tools for easy management.
Simple Registry
class ToolRegistry:
"""Manage available tools"""
def __init__(self):
self.tools = {}
def register(self, name, function, schema):
"""Register a new tool"""
self.tools[name] = {
"function": function,
"schema": schema
}
def get_tool(self, name):
"""Get tool by name"""
return self.tools.get(name)
def list_tools(self):
"""List all available tools"""
return list(self.tools.keys())
def get_schemas(self):
"""Get all tool schemas for LLM"""
return [tool["schema"] for tool in self.tools.values()]
def execute(self, name, params):
"""Execute a tool"""
tool = self.get_tool(name)
if not tool:
raise ValueError(f"Tool not found: {name}")
return tool["function"](params)
# Usage
registry = ToolRegistry()
# Register tools
registry.register(
name="search",
function=search_function,
schema={
"name": "search",
"description": "Search the web",
"parameters": {...}
}
)
# Use in agent
schemas = registry.get_schemas()
result = registry.execute("search", {"query": "AI"})
Advanced Registry with Decorators
class ToolRegistry:
def __init__(self):
self.tools = {}
def tool(self, name, description, parameters):
"""Decorator to register tools"""
def decorator(func):
self.tools[name] = {
"function": func,
"schema": {
"name": name,
"description": description,
"parameters": parameters
}
}
return func
return decorator
# Create registry
registry = ToolRegistry()
# Register tools with decorator
@registry.tool(
name="calculate",
description="Evaluate mathematical expressions",
parameters={
"type": "object",
"properties": {
"expression": {"type": "string"}
},
"required": ["expression"]
}
)
def calculate(expression):
"""Calculate mathematical expression"""
return eval(expression)
@registry.tool(
name="get_time",
description="Get current time",
parameters={"type": "object", "properties": {}}
)
def get_time():
"""Get current time"""
from datetime import datetime
return datetime.now().isoformat()
Complete Tool Integration Example
import openai
import json
from typing import Dict, Any, List
class Agent:
"""Agent with integrated tool system"""
def __init__(self, model="gpt-4"):
self.client = openai.OpenAI()
self.model = model
self.registry = ToolRegistry()
self._register_default_tools()
def _register_default_tools(self):
"""Register built-in tools"""
@self.registry.tool(
name="search",
description="Search for information",
parameters={
"type": "object",
"properties": {
"query": {"type": "string"}
},
"required": ["query"]
}
)
def search(query):
# Implement search
return f"Search results for: {query}"
@self.registry.tool(
name="calculate",
description="Evaluate math expressions",
parameters={
"type": "object",
"properties": {
"expression": {"type": "string"}
},
"required": ["expression"]
}
)
def calculate(expression):
try:
return str(eval(expression))
except Exception as e:
return f"Error: {str(e)}"
def run(self, user_input: str, max_steps: int = 10) -> str:
"""Run agent with tool integration"""
messages = [
{"role": "system", "content": "You are a helpful assistant with access to tools."},
{"role": "user", "content": user_input}
]
for step in range(max_steps):
# Call LLM with tools
response = self.client.chat.completions.create(
model=self.model,
messages=messages,
tools=self.registry.get_schemas(),
tool_choice="auto"
)
message = response.choices[0].message
messages.append(message)
# Check if done
if not message.tool_calls:
return message.content
# Execute tool calls
for tool_call in message.tool_calls:
function_name = tool_call.function.name
arguments = json.loads(tool_call.function.arguments)
# Execute tool
result = self.registry.execute(function_name, arguments)
# Add result to messages
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": str(result)
})
return "Max steps reached"
# Usage
agent = Agent()
response = agent.run("What is 25 * 17?")
print(response)
Best Practices
- Clear naming: Use descriptive, unambiguous tool names
- Comprehensive descriptions: Help the LLM understand when to use each tool
- Validate everything: Check parameters before execution
- Handle errors gracefully: Return useful error messages
- Keep tools focused: One tool, one purpose
- Document examples: Show typical usage in descriptions
- Version your tools: Track changes to tool interfaces
- Test thoroughly: Verify tools work with various inputs
Common Patterns
Conditional Tool Access
def get_available_tools(user_role):
"""Return tools based on user permissions"""
base_tools = ["search", "calculate"]
if user_role == "admin":
base_tools.extend(["delete_data", "modify_settings"])
return [registry.get_tool(name) for name in base_tools]
Tool Chaining
# Tools can call other tools
@registry.tool(name="research", ...)
def research(topic):
# Search for information
results = registry.execute("search", {"query": topic})
# Summarize results
summary = registry.execute("summarize", {"text": results})
return summary
Async Tool Execution
import asyncio
async def execute_tool_async(tool_name, params):
"""Execute tool asynchronously"""
tool = registry.get_tool(tool_name)
return await tool["function"](params)
# Execute multiple tools in parallel
results = await asyncio.gather(
execute_tool_async("search", {"query": "AI"}),
execute_tool_async("search", {"query": "ML"}),
execute_tool_async("search", {"query": "agents"})
)
Next Steps
Now that you understand tool integration, let’s build a complete hands-on project in the next section where you’ll create a research assistant agent with multiple tools!
Hands-On Project: Shopping Research Assistant
Project Overview
Build a Shopping Research Assistant that helps users make informed purchasing decisions by:
- Searching for products across multiple sources
- Comparing prices and features
- Reading product reviews
- Summarizing pros and cons
- Providing recommendations with reasoning
This project combines everything you’ve learned: ReAct pattern, tool integration, multi-step reasoning, and error handling.
What You’ll Build
An agent that can handle queries like:
- “Find the best laptop under $1000 for programming”
- “Compare noise-canceling headphones”
- “What are the top-rated coffee makers?”
- “Should I buy the iPhone 15 or Samsung S24?”
Project Setup
Dependencies
pip install openai requests beautifulsoup4 python-dotenv
Project Structure
shopping_agent/
├── agent.py # Main agent implementation
├── tools.py # Tool definitions
├── config.py # Configuration
├── .env # API keys
└── test_agent.py # Test cases
Configuration
# config.py
import os
from dotenv import load_dotenv
load_dotenv()
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
MODEL = "gpt-4"
MAX_STEPS = 15
TEMPERATURE = 0.7
Implement the Tools
Tool 1: Product Search
# tools.py
import requests
from typing import Dict, List
def search_products(query: str, max_results: int = 5) -> str:
"""
Search for products matching the query.
Returns product names, prices, and URLs.
"""
try:
# Using a mock API for demonstration
# In production, use real APIs like Amazon Product API, eBay, etc.
# Simulate search results
results = [
{
"name": f"Product {i+1} for {query}",
"price": f"${100 + i*50}",
"rating": f"{4.0 + i*0.2:.1f}/5.0",
"url": f"https://example.com/product-{i+1}"
}
for i in range(max_results)
]
# Format results
output = f"Found {len(results)} products:\n\n"
for i, product in enumerate(results, 1):
output += f"{i}. {product['name']}\n"
output += f" Price: {product['price']}\n"
output += f" Rating: {product['rating']}\n"
output += f" URL: {product['url']}\n\n"
return output
except Exception as e:
return f"Error searching products: {str(e)}"
def search_products_real(query: str, max_results: int = 5) -> str:
"""
Real implementation using web search.
Searches Google Shopping or similar.
"""
try:
# Example with Google Custom Search API
api_key = os.getenv("GOOGLE_API_KEY")
search_engine_id = os.getenv("GOOGLE_SEARCH_ENGINE_ID")
url = "https://www.googleapis.com/customsearch/v1"
params = {
"key": api_key,
"cx": search_engine_id,
"q": query + " buy price",
"num": max_results
}
response = requests.get(url, params=params, timeout=10)
response.raise_for_status()
data = response.json()
items = data.get("items", [])
output = f"Found {len(items)} products:\n\n"
for i, item in enumerate(items, 1):
output += f"{i}. {item['title']}\n"
output += f" {item['snippet']}\n"
output += f" URL: {item['link']}\n\n"
return output
except Exception as e:
return f"Error: {str(e)}"
Tool 2: Get Product Details
from bs4 import BeautifulSoup
def get_product_details(url: str) -> str:
"""
Extract detailed information from a product page.
Returns specs, description, and reviews summary.
"""
try:
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
}
response = requests.get(url, headers=headers, timeout=10)
response.raise_for_status()
soup = BeautifulSoup(response.content, 'html.parser')
# Extract text (simplified)
# In production, use specific selectors for each site
text = soup.get_text(separator='\n', strip=True)
# Limit length
max_length = 2000
if len(text) > max_length:
text = text[:max_length] + "..."
return f"Product details from {url}:\n\n{text}"
except Exception as e:
return f"Error fetching product details: {str(e)}"
Tool 3: Compare Products
def compare_products(product_list: str) -> str:
"""
Compare multiple products based on provided information.
Input: Comma-separated product names or descriptions.
Returns: Comparison table.
"""
try:
products = [p.strip() for p in product_list.split(',')]
if len(products) < 2:
return "Error: Need at least 2 products to compare"
output = "Product Comparison:\n\n"
output += "To compare these products effectively, I need their details.\n"
output += "Please use get_product_details for each product first.\n\n"
output += f"Products to compare: {', '.join(products)}"
return output
except Exception as e:
return f"Error comparing products: {str(e)}"
Tool 4: Get Reviews Summary
def get_reviews_summary(product_name: str) -> str:
"""
Get a summary of customer reviews for a product.
Returns common pros, cons, and overall sentiment.
"""
try:
# Mock implementation
# In production, scrape from Amazon, Reddit, review sites
reviews = {
"overall_rating": "4.3/5.0",
"total_reviews": 1247,
"pros": [
"Excellent build quality",
"Great performance",
"Good value for money"
],
"cons": [
"Battery life could be better",
"Slightly heavy",
"Limited color options"
],
"common_themes": [
"Users love the performance",
"Some complaints about weight",
"Generally recommended"
]
}
output = f"Reviews Summary for {product_name}:\n\n"
output += f"Overall Rating: {reviews['overall_rating']} ({reviews['total_reviews']} reviews)\n\n"
output += "Pros:\n"
for pro in reviews['pros']:
output += f" ✓ {pro}\n"
output += "\nCons:\n"
for con in reviews['cons']:
output += f" ✗ {con}\n"
output += "\nCommon Themes:\n"
for theme in reviews['common_themes']:
output += f" • {theme}\n"
return output
except Exception as e:
return f"Error getting reviews: {str(e)}"
Tool 5: Price History
def get_price_history(product_name: str) -> str:
"""
Get price history and trends for a product.
Helps determine if current price is good.
"""
try:
# Mock implementation
# In production, use CamelCamelCamel API, Keepa, etc.
history = {
"current_price": "$899",
"lowest_price": "$799 (3 months ago)",
"highest_price": "$999 (6 months ago)",
"average_price": "$879",
"trend": "stable",
"recommendation": "Current price is close to average. Good time to buy."
}
output = f"Price History for {product_name}:\n\n"
output += f"Current Price: {history['current_price']}\n"
output += f"Lowest Price: {history['lowest_price']}\n"
output += f"Highest Price: {history['highest_price']}\n"
output += f"Average Price: {history['average_price']}\n"
output += f"Trend: {history['trend']}\n\n"
output += f"💡 {history['recommendation']}"
return output
except Exception as e:
return f"Error getting price history: {str(e)}"
Build the Agent
Tool Registry
# agent.py
from tools import (
search_products,
get_product_details,
compare_products,
get_reviews_summary,
get_price_history
)
class ShoppingAgent:
"""Shopping Research Assistant Agent"""
def __init__(self):
self.tools = self._create_tool_schemas()
self.client = openai.OpenAI()
def _create_tool_schemas(self):
"""Define tool schemas for OpenAI function calling"""
return [
{
"type": "function",
"function": {
"name": "search_products",
"description": "Search for products matching a query. Use when user asks to find or search for products.",
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "Product search query (e.g., 'laptop under $1000')"
},
"max_results": {
"type": "integer",
"description": "Maximum number of results (default: 5)",
"default": 5
}
},
"required": ["query"]
}
}
},
{
"type": "function",
"function": {
"name": "get_product_details",
"description": "Get detailed information about a specific product from its URL.",
"parameters": {
"type": "object",
"properties": {
"url": {
"type": "string",
"description": "Product page URL"
}
},
"required": ["url"]
}
}
},
{
"type": "function",
"function": {
"name": "get_reviews_summary",
"description": "Get summary of customer reviews including pros, cons, and ratings.",
"parameters": {
"type": "object",
"properties": {
"product_name": {
"type": "string",
"description": "Product name"
}
},
"required": ["product_name"]
}
}
},
{
"type": "function",
"function": {
"name": "get_price_history",
"description": "Get price history and determine if current price is good.",
"parameters": {
"type": "object",
"properties": {
"product_name": {
"type": "string",
"description": "Product name"
}
},
"required": ["product_name"]
}
}
},
{
"type": "function",
"function": {
"name": "compare_products",
"description": "Compare multiple products. Use after gathering details about each product.",
"parameters": {
"type": "object",
"properties": {
"product_list": {
"type": "string",
"description": "Comma-separated list of product names"
}
},
"required": ["product_list"]
}
}
}
]
def _execute_tool(self, tool_name: str, arguments: dict) -> str:
"""Execute a tool and return result"""
tool_map = {
"search_products": search_products,
"get_product_details": get_product_details,
"compare_products": compare_products,
"get_reviews_summary": get_reviews_summary,
"get_price_history": get_price_history
}
if tool_name not in tool_map:
return f"Error: Unknown tool {tool_name}"
try:
result = tool_map[tool_name](**arguments)
return result
except Exception as e:
return f"Error executing {tool_name}: {str(e)}"
def run(self, user_query: str, max_steps: int = 15) -> str:
"""Run the shopping assistant agent"""
messages = [
{
"role": "system",
"content": """You are a helpful shopping research assistant.
Your goal is to help users make informed purchasing decisions by:
1. Searching for relevant products
2. Gathering detailed information and reviews
3. Comparing options
4. Providing clear recommendations with reasoning
Always:
- Search for products before making recommendations
- Check reviews and ratings
- Consider price history when available
- Compare multiple options when relevant
- Cite specific information from your research
- Be honest about limitations
Format your final recommendation clearly with pros, cons, and reasoning."""
},
{"role": "user", "content": user_query}
]
print(f"🛍️ User: {user_query}\n")
for step in range(max_steps):
# Get LLM response
response = self.client.chat.completions.create(
model="gpt-4",
messages=messages,
tools=self.tools,
tool_choice="auto",
temperature=0.7
)
message = response.choices[0].message
# If no tool calls, we're done
if not message.tool_calls:
print(f"🤖 Assistant: {message.content}\n")
return message.content
# Add assistant message
messages.append(message)
# Execute tool calls
for tool_call in message.tool_calls:
function_name = tool_call.function.name
arguments = json.loads(tool_call.function.arguments)
print(f"🔧 Using tool: {function_name}({arguments})")
# Execute tool
result = self._execute_tool(function_name, arguments)
print(f"📊 Result: {result[:200]}...\n")
# Add tool result to messages
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": result
})
return "⚠️ Max steps reached without completing the task"
Complete Implementation
# agent.py (complete file)
import openai
import json
from config import OPENAI_API_KEY, MODEL
from tools import (
search_products,
get_product_details,
compare_products,
get_reviews_summary,
get_price_history
)
openai.api_key = OPENAI_API_KEY
# [ShoppingAgent class from above]
def main():
"""Test the shopping agent"""
agent = ShoppingAgent()
# Example queries
queries = [
"Find the best noise-canceling headphones under $300",
"Compare iPhone 15 Pro and Samsung Galaxy S24",
"What's a good coffee maker for home use?"
]
for query in queries:
print("=" * 60)
result = agent.run(query)
print("=" * 60)
print()
if __name__ == "__main__":
main()
Test Cases
# test_agent.py
from agent import ShoppingAgent
def test_product_search():
"""Test basic product search"""
agent = ShoppingAgent()
result = agent.run("Find wireless keyboards under $50")
assert "Product" in result or "keyboard" in result.lower()
print("✓ Product search test passed")
def test_comparison():
"""Test product comparison"""
agent = ShoppingAgent()
result = agent.run("Compare MacBook Air vs Dell XPS 13")
assert len(result) > 100 # Should have substantial response
print("✓ Comparison test passed")
def test_reviews():
"""Test review gathering"""
agent = ShoppingAgent()
result = agent.run("What do people say about AirPods Pro?")
assert "review" in result.lower() or "rating" in result.lower()
print("✓ Reviews test passed")
if __name__ == "__main__":
test_product_search()
test_comparison()
test_reviews()
print("\n✅ All tests passed!")
Debug Common Issues
Issue 1: Agent Doesn’t Use Tools
Problem: Agent responds without searching
Solution: Strengthen system prompt
"You MUST use the search_products tool before making any recommendations.
Never rely on prior knowledge about products or prices."
Issue 2: Infinite Search Loop
Problem: Agent keeps searching without concluding
Solution: Add step tracking and guidance
# Track tool usage
tool_usage = {}
if tool_name in tool_usage:
tool_usage[tool_name] += 1
if tool_usage[tool_name] > 3:
return "You've used this tool multiple times. Please synthesize your findings."
Issue 3: Hallucinated Product Info
Problem: Agent invents product details
Solution: Emphasize tool-only information
"CRITICAL: Only use information from tool results.
If a tool doesn't return information, say so explicitly.
Never make up product names, prices, or specifications."
Issue 4: Poor Recommendations
Problem: Recommendations lack depth
Solution: Add structured output requirement
"Format your final recommendation as:
**Recommendation**: [Product name]
**Why**: [2-3 key reasons]
**Pros**:
- [Pro 1]
- [Pro 2]
**Cons**:
- [Con 1]
- [Con 2]
**Price**: [Current price and value assessment]"
Enhancements
1. Add Budget Tracking
def check_budget(price: str, budget: float) -> bool:
"""Check if price is within budget"""
# Extract numeric price
price_num = float(price.replace('$', '').replace(',', ''))
return price_num <= budget
2. Save Research Sessions
def save_research(query: str, results: str):
"""Save research for later reference"""
with open(f"research_{timestamp}.txt", "w") as f:
f.write(f"Query: {query}\n\n{results}")
3. Multi-Store Price Comparison
def compare_prices_across_stores(product: str) -> dict:
"""Check prices at Amazon, Walmart, Best Buy, etc."""
stores = ["Amazon", "Walmart", "Best Buy"]
prices = {}
for store in stores:
prices[store] = search_store_price(store, product)
return prices
4. Deal Alerts
def check_for_deals(product: str) -> str:
"""Check if product is on sale or has coupons"""
# Check deal sites, coupon codes, etc.
pass
5. Personalization
def get_user_preferences() -> dict:
"""Load user preferences (brands, price range, features)"""
return {
"preferred_brands": ["Sony", "Apple"],
"max_price": 500,
"must_have_features": ["wireless", "noise-canceling"]
}
Practice Exercises
Exercise 1: Add a New Tool (Easy)
Task: Add a compare_prices tool that compares prices across products.
Requirements:
- Takes a list of products with prices
- Returns the cheapest option
- Handles missing price data
Click to see solution
def compare_prices(products: List[Dict]) -> Dict:
"""Compare prices and find cheapest"""
valid_products = [p for p in products if "price" in p]
if not valid_products:
return {"error": "No products with prices"}
cheapest = min(valid_products, key=lambda x: x["price"])
return {
"cheapest": cheapest,
"savings": valid_products[0]["price"] - cheapest["price"]
}
Exercise 2: Improve Error Handling (Medium)
Task: Enhance the agent to handle API timeouts and retries.
Requirements:
- Retry failed tool calls up to 3 times
- Use exponential backoff
- Log all retry attempts
Click to see solution
import time
def execute_tool_with_retry(tool_name: str, args: dict, max_retries: int = 3):
"""Execute tool with retry logic"""
for attempt in range(max_retries):
try:
result = execute_tool(tool_name, args)
return result
except Exception as e:
if attempt == max_retries - 1:
raise
wait_time = 2 ** attempt # Exponential backoff
print(f"Retry {attempt + 1}/{max_retries} after {wait_time}s")
time.sleep(wait_time)
Exercise 3: Build a Travel Agent (Hard)
Task: Create a travel planning agent with these tools:
search_flights(origin, destination, date)search_hotels(location, checkin, checkout)get_weather(location, date)calculate_budget(flights, hotels, days)
Challenge: Agent should create a complete travel plan with budget.
Click to see solution
class TravelAgent:
def __init__(self):
self.client = openai.OpenAI()
self.tools = [
{
"name": "search_flights",
"description": "Search for flights",
"parameters": {
"type": "object",
"properties": {
"origin": {"type": "string"},
"destination": {"type": "string"},
"date": {"type": "string"}
},
"required": ["origin", "destination", "date"]
}
},
# Add other tools...
]
def plan_trip(self, request: str) -> Dict:
"""Plan complete trip"""
messages = [{"role": "user", "content": request}]
for _ in range(10):
response = self.client.chat.completions.create(
model="gpt-4",
messages=messages,
tools=self.tools
)
message = response.choices[0].message
if message.tool_calls:
# Execute tools and continue
for tool_call in message.tool_calls:
result = self.execute_tool(tool_call)
messages.append({
"role": "tool",
"content": json.dumps(result),
"tool_call_id": tool_call.id
})
else:
return {"plan": message.content}
return {"error": "Max steps reached"}
✅ Key Takeaways
- ReAct agents combine reasoning with tool use
- Tool integration requires clear schemas and validation
- Error handling and retries improve reliability
- Real-world agents need multiple specialized tools
- Practice builds intuition for agent design
Next Steps
Congratulations! You’ve built a complete shopping research assistant. You now understand:
- ✅ ReAct pattern implementation
- ✅ Tool integration and validation
- ✅ Multi-step reasoning
- ✅ Error handling and debugging
- ✅ Real-world agent applications
In Chapter 3, we’ll explore advanced agent patterns including planning, memory systems, and multi-agent collaboration!
Planning Agents
Module 3: Learning Objectives
By the end of this module, you will:
- ✓ Implement planning algorithms (Chain-of-Thought, task decomposition)
- ✓ Build memory systems (short-term, long-term, semantic)
- ✓ Create multi-agent systems with collaboration patterns
- ✓ Understand when to use planning vs reactive approaches
- ✓ Design agent communication protocols
Introduction to Planning
Simple ReAct agents decide one step at a time. Planning agents think ahead—they create a multi-step plan before executing, leading to more efficient and coherent task completion.
Why Planning Matters
graph TB
subgraph "Without Planning"
A1[Search flights] --> A2[Search hotels]
A2 --> A3[Dates don't match!]
A3 --> A4[Search flights again]
A4 --> A5[Search hotels again]
end
subgraph "With Planning"
B1[Plan all steps] --> B2[Determine dates]
B2 --> B3[Search flights]
B3 --> B4[Search hotels]
B4 --> B5[Done efficiently]
end
style A3 fill:#fee2e2
style B5 fill:#d1fae5
Without Planning (Reactive):
- Search flights → Search hotels → Dates mismatch → Redo everything
- Inefficient, multiple retries
With Planning (Proactive):
- Plan: dates → flights → hotels → booking
- Execute efficiently in one pass
⚠️ When to Use Planning
Use planning for:
- Multi-step tasks with dependencies
- Tasks requiring coordination
- Resource-constrained scenarios
Skip planning for:
- Simple single-step tasks
- Highly dynamic environments
- When speed is critical
Chain-of-Thought Reasoning
Chain-of-Thought (CoT) prompting encourages step-by-step reasoning.
Basic CoT
SYSTEM_PROMPT = """When solving problems, think step by step:
1. Understand the problem
2. Break it into sub-problems
3. Solve each sub-problem
4. Combine solutions
Example:
User: "I need to prepare for a camping trip next weekend"
Thought: Let me break this down:
1. Determine what items are needed for camping
2. Check what the user already has
3. Create a shopping list for missing items
4. Suggest where to buy them
Now I'll execute this plan..."""
Zero-Shot CoT
Simply add “Let’s think step by step”:
def zero_shot_cot(query):
"""Use zero-shot chain of thought"""
prompt = f"{query}\n\nLet's think step by step:"
return llm.generate(prompt)
Few-Shot CoT
Provide examples of step-by-step reasoning:
FEW_SHOT_EXAMPLES = """
Example 1:
User: "Plan a birthday party for 20 people"
Reasoning:
1. Determine budget and venue
2. Create guest list (20 people)
3. Choose date and send invitations
4. Plan menu and order food
5. Arrange entertainment and decorations
6. Prepare day-of schedule
Example 2:
User: "Debug why my website is slow"
Reasoning:
1. Measure current performance metrics
2. Identify bottlenecks (database, network, code)
3. Prioritize issues by impact
4. Fix highest-impact issues first
5. Re-measure to verify improvements
"""
Task Decomposition
Breaking complex tasks into manageable subtasks.
Hierarchical Decomposition
def decompose_task(task: str) -> dict:
"""Decompose task into hierarchy"""
prompt = f"""Break down this task into subtasks:
Task: {task}
Format as:
Main Goal: [goal]
Subtasks:
1. [subtask 1]
1.1 [sub-subtask]
1.2 [sub-subtask]
2. [subtask 2]
3. [subtask 3]
"""
response = llm.generate(prompt)
return parse_task_hierarchy(response)
# Example output
{
"goal": "Launch a new product",
"subtasks": [
{
"id": 1,
"task": "Market research",
"subtasks": [
{"id": 1.1, "task": "Identify target audience"},
{"id": 1.2, "task": "Analyze competitors"}
]
},
{
"id": 2,
"task": "Product development"
},
{
"id": 3,
"task": "Marketing campaign"
}
]
}
Dependency-Aware Decomposition
class Task:
def __init__(self, name, dependencies=None):
self.name = name
self.dependencies = dependencies or []
self.status = "pending"
def create_task_graph(goal: str) -> List[Task]:
"""Create task graph with dependencies"""
tasks = [
Task("Research market", dependencies=[]),
Task("Design product", dependencies=["Research market"]),
Task("Build prototype", dependencies=["Design product"]),
Task("Test prototype", dependencies=["Build prototype"]),
Task("Launch", dependencies=["Test prototype", "Marketing ready"])
]
return tasks
def get_executable_tasks(tasks: List[Task]) -> List[Task]:
"""Get tasks that can be executed now"""
return [
task for task in tasks
if task.status == "pending" and
all(dep.status == "completed" for dep in task.dependencies)
]
Plan-and-Execute Frameworks
Separate planning from execution for better control.
Basic Plan-and-Execute
class PlanExecuteAgent:
"""Agent that plans first, then executes"""
def __init__(self):
self.client = openai.OpenAI()
self.tools = self._load_tools()
def plan(self, goal: str) -> List[str]:
"""Create execution plan"""
prompt = f"""Create a detailed plan to accomplish this goal:
Goal: {goal}
Available tools: {', '.join(self.tools.keys())}
Provide a numbered list of steps. Each step should:
- Be specific and actionable
- Use available tools
- Build on previous steps
Plan:"""
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.3
)
plan_text = response.choices[0].message.content
steps = self._parse_plan(plan_text)
return steps
def execute(self, steps: List[str]) -> str:
"""Execute plan steps"""
results = []
for i, step in enumerate(steps, 1):
print(f"Executing step {i}: {step}")
# Use ReAct agent to execute each step
result = self._execute_step(step)
results.append(result)
# Check if we should continue
if self._should_replan(result):
print("Replanning needed...")
remaining = steps[i:]
new_plan = self.plan(f"Complete: {', '.join(remaining)}")
steps = steps[:i] + new_plan
return self._synthesize_results(results)
def run(self, goal: str) -> str:
"""Plan and execute"""
print(f"Goal: {goal}\n")
# Create plan
plan = self.plan(goal)
print("Plan:")
for i, step in enumerate(plan, 1):
print(f" {i}. {step}")
print()
# Execute plan
result = self.execute(plan)
return result
Example Usage
agent = PlanExecuteAgent()
result = agent.run(
"Research electric cars under $40k and create a comparison report"
)
# Output:
# Goal: Research electric cars under $40k and create a comparison report
#
# Plan:
# 1. Search for electric cars priced under $40,000
# 2. Get detailed specs for top 5 models
# 3. Compare range, charging time, and features
# 4. Check customer reviews for each model
# 5. Create structured comparison report
#
# Executing step 1: Search for electric cars...
# Executing step 2: Get detailed specs...
# ...
Replanning and Adaptation
Plans often need adjustment based on results.
When to Replan
def should_replan(step_result: str, original_plan: List[str]) -> bool:
"""Determine if replanning is needed"""
# Error occurred
if "error" in step_result.lower():
return True
# Unexpected result
if "not found" in step_result.lower():
return True
# New information changes approach
if "alternative" in step_result.lower():
return True
return False
Replanning Strategies
1. Full Replan: Start over with new information
def full_replan(goal: str, context: str) -> List[str]:
"""Create entirely new plan"""
prompt = f"""Original goal: {goal}
Context from execution so far:
{context}
Create a new plan considering this context:"""
return create_plan(prompt)
2. Partial Replan: Adjust remaining steps
def partial_replan(remaining_steps: List[str], issue: str) -> List[str]:
"""Adjust remaining steps"""
prompt = f"""We encountered an issue: {issue}
Remaining steps were:
{format_steps(remaining_steps)}
Adjust the plan to work around this issue:"""
return create_plan(prompt)
3. Alternative Path: Try different approach
def find_alternative(failed_step: str, goal: str) -> str:
"""Find alternative way to accomplish step"""
prompt = f"""This step failed: {failed_step}
Goal: {goal}
Suggest an alternative approach:"""
return llm.generate(prompt)
Adaptive Planning Agent
class AdaptivePlanningAgent:
"""Agent that adapts plan based on execution"""
def __init__(self, max_replans=3):
self.max_replans = max_replans
self.replan_count = 0
def execute_with_adaptation(self, goal: str) -> str:
"""Execute with adaptive replanning"""
plan = self.plan(goal)
context = []
i = 0
while i < len(plan):
step = plan[i]
# Execute step
result = self.execute_step(step)
context.append({"step": step, "result": result})
# Check if replanning needed
if self.should_replan(result):
if self.replan_count >= self.max_replans:
return "Max replans reached. Unable to complete goal."
# Replan remaining steps
remaining_goal = self.extract_remaining_goal(plan[i+1:])
new_steps = self.replan(remaining_goal, context)
# Update plan
plan = plan[:i+1] + new_steps
self.replan_count += 1
print(f"🔄 Replanned ({self.replan_count}/{self.max_replans})")
i += 1
return self.synthesize_results(context)
Plan Representation
Different ways to represent plans.
Linear Plan
plan = [
"Step 1: Search for products",
"Step 2: Compare prices",
"Step 3: Read reviews",
"Step 4: Make recommendation"
]
Tree Plan
plan = {
"root": "Research product",
"branches": [
{
"node": "Gather information",
"branches": [
{"node": "Search products"},
{"node": "Get specifications"}
]
},
{
"node": "Analyze",
"branches": [
{"node": "Compare features"},
{"node": "Check reviews"}
]
},
{
"node": "Recommend"}
]
}
Graph Plan
from dataclasses import dataclass
from typing import List, Set
@dataclass
class PlanNode:
id: str
action: str
dependencies: Set[str]
status: str = "pending"
plan_graph = [
PlanNode("1", "Search products", set()),
PlanNode("2", "Get details A", {"1"}),
PlanNode("3", "Get details B", {"1"}),
PlanNode("4", "Compare", {"2", "3"}),
PlanNode("5", "Recommend", {"4"})
]
def get_ready_nodes(graph: List[PlanNode]) -> List[PlanNode]:
"""Get nodes ready to execute"""
completed = {n.id for n in graph if n.status == "completed"}
return [
node for node in graph
if node.status == "pending" and
node.dependencies.issubset(completed)
]
Advanced Planning Techniques
Backward Chaining
Start from goal and work backwards:
def backward_chain(goal: str, current_state: dict) -> List[str]:
"""Plan by working backwards from goal"""
plan = []
current_goal = goal
while not is_satisfied(current_goal, current_state):
# What's needed to achieve current_goal?
prerequisite = find_prerequisite(current_goal)
plan.insert(0, prerequisite)
current_goal = prerequisite
return plan
# Example
goal = "Have dinner ready"
# Backward chain:
# "Have dinner ready" requires "Food is cooked"
# "Food is cooked" requires "Ingredients prepared"
# "Ingredients prepared" requires "Groceries purchased"
# Plan: [Buy groceries, Prepare ingredients, Cook food]
Hierarchical Task Network (HTN)
class HTNPlanner:
"""Hierarchical Task Network planner"""
def __init__(self):
self.methods = {
"travel_to_city": [
["book_flight", "take_flight"],
["book_train", "take_train"],
["rent_car", "drive"]
],
"book_flight": [
["search_flights", "select_flight", "pay"]
]
}
def decompose(self, task: str) -> List[str]:
"""Decompose high-level task"""
if task not in self.methods:
return [task] # Primitive task
# Choose best method
method = self.select_method(task)
# Recursively decompose
plan = []
for subtask in method:
plan.extend(self.decompose(subtask))
return plan
Monte Carlo Tree Search (MCTS) for Planning
class MCTSPlanner:
"""Use MCTS to find optimal plan"""
def plan(self, goal: str, num_simulations: int = 100):
"""Find plan using MCTS"""
root = Node(state=initial_state, goal=goal)
for _ in range(num_simulations):
# Selection
node = self.select(root)
# Expansion
if not node.is_terminal():
node = self.expand(node)
# Simulation
reward = self.simulate(node)
# Backpropagation
self.backpropagate(node, reward)
# Return best path
return self.best_path(root)
Practical Planning Agent
class PracticalPlanningAgent:
"""Production-ready planning agent"""
def __init__(self):
self.client = openai.OpenAI()
self.max_steps = 20
self.max_replans = 3
def run(self, goal: str) -> str:
"""Execute goal with planning"""
# 1. Create initial plan
plan = self.create_plan(goal)
print("📋 Initial Plan:")
for i, step in enumerate(plan, 1):
print(f" {i}. {step}")
print()
# 2. Execute with monitoring
results = []
replan_count = 0
for i, step in enumerate(plan):
print(f"▶️ Step {i+1}/{len(plan)}: {step}")
# Execute step
result = self.execute_step(step, results)
results.append({"step": step, "result": result})
# Check success
if self.is_failure(result):
if replan_count >= self.max_replans:
return self.handle_failure(goal, results)
# Replan
print(f"⚠️ Step failed, replanning...")
new_plan = self.replan(goal, plan[i+1:], results)
plan = plan[:i+1] + new_plan
replan_count += 1
print(f"✓ Completed\n")
# 3. Synthesize final result
return self.synthesize(goal, results)
def create_plan(self, goal: str) -> List[str]:
"""Create execution plan"""
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{
"role": "user",
"content": f"""Create a step-by-step plan for: {goal}
Requirements:
- Each step should be specific and actionable
- Steps should build on each other logically
- Include verification steps
- Keep it concise (max 10 steps)
Format as numbered list."""
}],
temperature=0.3
)
return self.parse_plan(response.choices[0].message.content)
Best Practices
- Plan at the right level: Not too detailed, not too vague
- Include verification: Check if steps succeeded
- Be flexible: Allow replanning when needed
- Consider dependencies: Respect task ordering
- Set limits: Max steps, max replans
- Monitor progress: Track what’s completed
- Learn from failures: Improve planning over time
✅ Key Takeaways
- Planning agents create multi-step plans before executing
- Chain-of-Thought enables step-by-step reasoning
- Task decomposition breaks complex goals into manageable steps
- Plan-and-Execute pattern separates planning from execution
- Replanning allows adaptation when plans fail
- Use planning for complex, multi-step tasks with dependencies
Next Steps
You now understand planning agents! Next, we’ll explore memory systems that allow agents to remember and learn from past interactions.
Memory Systems
Why Agents Need Memory
Without memory, agents are like people with amnesia—they can’t learn from experience, maintain context, or build on previous interactions.
Without Memory:
User: "My name is Alice"
Agent: "Nice to meet you!"
[Later]
User: "What's my name?"
Agent: "I don't know your name."
With Memory:
User: "My name is Alice"
Agent: "Nice to meet you, Alice!" [stores: user_name = "Alice"]
[Later]
User: "What's my name?"
Agent: "Your name is Alice." [retrieves: user_name]
Types of Memory
Short-Term Memory (Working Memory)
Temporary storage for the current task.
Characteristics:
- Limited capacity (context window)
- Cleared after task completion
- Fast access
- Stored in conversation history
What to store:
- Current conversation
- Intermediate results
- Active plan
- Tool outputs
Long-Term Memory (Persistent Memory)
Permanent storage across sessions.
Characteristics:
- Unlimited capacity (database)
- Persists across sessions
- Slower access (requires retrieval)
- Stored in external systems
What to store:
- User preferences
- Past conversations
- Learned facts
- Successful strategies
Conversation History Management
Managing the conversation context efficiently.
Basic History Tracking
class ConversationMemory:
"""Simple conversation history"""
def __init__(self, max_messages=20):
self.messages = []
self.max_messages = max_messages
def add_message(self, role: str, content: str):
"""Add message to history"""
self.messages.append({
"role": role,
"content": content,
"timestamp": time.time()
})
# Trim if too long
if len(self.messages) > self.max_messages:
self.messages = self.messages[-self.max_messages:]
def get_messages(self) -> List[dict]:
"""Get conversation history"""
return self.messages
def clear(self):
"""Clear history"""
self.messages = []
Sliding Window
Keep only recent messages:
class SlidingWindowMemory:
"""Keep last N messages"""
def __init__(self, window_size=10):
self.window_size = window_size
self.messages = []
def add(self, message: dict):
"""Add message and maintain window"""
self.messages.append(message)
# Keep only last N messages
if len(self.messages) > self.window_size:
self.messages = self.messages[-self.window_size:]
def get_context(self) -> List[dict]:
"""Get current window"""
return self.messages
Token-Based Truncation
Manage by token count instead of message count:
import tiktoken
class TokenAwareMemory:
"""Manage memory by token budget"""
def __init__(self, max_tokens=4000, model="gpt-4"):
self.max_tokens = max_tokens
self.messages = []
self.encoding = tiktoken.encoding_for_model(model)
def count_tokens(self, text: str) -> int:
"""Count tokens in text"""
return len(self.encoding.encode(text))
def get_total_tokens(self) -> int:
"""Count total tokens in history"""
total = 0
for msg in self.messages:
total += self.count_tokens(msg["content"])
return total
def add(self, message: dict):
"""Add message and trim if needed"""
self.messages.append(message)
# Trim oldest messages if over budget
while self.get_total_tokens() > self.max_tokens and len(self.messages) > 1:
self.messages.pop(0) # Remove oldest
def get_context(self) -> List[dict]:
"""Get messages within token budget"""
return self.messages
Summarization Strategy
Compress old messages:
class SummarizingMemory:
"""Summarize old conversations"""
def __init__(self, summary_threshold=20):
self.messages = []
self.summary = None
self.summary_threshold = summary_threshold
def add(self, message: dict):
"""Add message and summarize if needed"""
self.messages.append(message)
if len(self.messages) > self.summary_threshold:
self.summarize_old_messages()
def summarize_old_messages(self):
"""Summarize and compress old messages"""
# Take first half of messages
to_summarize = self.messages[:len(self.messages)//2]
# Create summary
summary_text = self.create_summary(to_summarize)
# Update summary
if self.summary:
self.summary += f"\n\n{summary_text}"
else:
self.summary = summary_text
# Keep only recent messages
self.messages = self.messages[len(self.messages)//2:]
def create_summary(self, messages: List[dict]) -> str:
"""Generate summary of messages"""
conversation = "\n".join([
f"{m['role']}: {m['content']}" for m in messages
])
prompt = f"""Summarize this conversation concisely:
{conversation}
Summary:"""
return llm.generate(prompt)
def get_context(self) -> List[dict]:
"""Get context with summary"""
context = []
if self.summary:
context.append({
"role": "system",
"content": f"Previous conversation summary:\n{self.summary}"
})
context.extend(self.messages)
return context
Vector Databases for Semantic Memory
Store and retrieve information by meaning, not just keywords.
Why Vector Databases?
Traditional search: “Find messages containing ‘Python’” Semantic search: “Find messages about programming languages”
Basic Vector Memory
import numpy as np
from typing import List, Tuple
class VectorMemory:
"""Simple vector-based memory"""
def __init__(self):
self.memories = []
self.embeddings = []
def add(self, text: str, metadata: dict = None):
"""Store memory with embedding"""
# Get embedding
embedding = self.get_embedding(text)
self.memories.append({
"text": text,
"metadata": metadata or {},
"timestamp": time.time()
})
self.embeddings.append(embedding)
def get_embedding(self, text: str) -> np.ndarray:
"""Get embedding for text"""
# Using OpenAI embeddings
response = openai.embeddings.create(
model="text-embedding-3-small",
input=text
)
return np.array(response.data[0].embedding)
def search(self, query: str, top_k: int = 5) -> List[dict]:
"""Search for relevant memories"""
if not self.memories:
return []
# Get query embedding
query_embedding = self.get_embedding(query)
# Calculate similarities
similarities = []
for i, emb in enumerate(self.embeddings):
similarity = self.cosine_similarity(query_embedding, emb)
similarities.append((i, similarity))
# Sort by similarity
similarities.sort(key=lambda x: x[1], reverse=True)
# Return top k
results = []
for i, score in similarities[:top_k]:
result = self.memories[i].copy()
result["similarity"] = score
results.append(result)
return results
def cosine_similarity(self, a: np.ndarray, b: np.ndarray) -> float:
"""Calculate cosine similarity"""
return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
Using Chroma
import chromadb
from chromadb.config import Settings
class ChromaMemory:
"""Memory using ChromaDB"""
def __init__(self, collection_name="agent_memory"):
self.client = chromadb.Client(Settings(
chroma_db_impl="duckdb+parquet",
persist_directory="./chroma_db"
))
self.collection = self.client.get_or_create_collection(
name=collection_name,
metadata={"description": "Agent memory storage"}
)
def add(self, text: str, metadata: dict = None):
"""Add memory"""
doc_id = f"mem_{int(time.time() * 1000)}"
self.collection.add(
documents=[text],
metadatas=[metadata or {}],
ids=[doc_id]
)
def search(self, query: str, n_results: int = 5) -> List[dict]:
"""Search memories"""
results = self.collection.query(
query_texts=[query],
n_results=n_results
)
memories = []
for i in range(len(results['documents'][0])):
memories.append({
"text": results['documents'][0][i],
"metadata": results['metadatas'][0][i],
"distance": results['distances'][0][i]
})
return memories
def delete_all(self):
"""Clear all memories"""
self.client.delete_collection(self.collection.name)
Using Pinecone
import pinecone
class PineconeMemory:
"""Memory using Pinecone"""
def __init__(self, index_name="agent-memory"):
pinecone.init(
api_key=os.getenv("PINECONE_API_KEY"),
environment=os.getenv("PINECONE_ENV")
)
# Create index if doesn't exist
if index_name not in pinecone.list_indexes():
pinecone.create_index(
name=index_name,
dimension=1536, # OpenAI embedding size
metric="cosine"
)
self.index = pinecone.Index(index_name)
def add(self, text: str, metadata: dict = None):
"""Add memory"""
# Get embedding
embedding = self.get_embedding(text)
# Generate ID
doc_id = f"mem_{int(time.time() * 1000)}"
# Upsert to Pinecone
self.index.upsert([(
doc_id,
embedding,
{
"text": text,
**(metadata or {})
}
)])
def search(self, query: str, top_k: int = 5) -> List[dict]:
"""Search memories"""
query_embedding = self.get_embedding(query)
results = self.index.query(
vector=query_embedding,
top_k=top_k,
include_metadata=True
)
memories = []
for match in results['matches']:
memories.append({
"text": match['metadata']['text'],
"score": match['score'],
"metadata": match['metadata']
})
return memories
Entity Tracking and State Management
Track entities (people, places, things) mentioned in conversations.
Entity Extraction
class EntityTracker:
"""Track entities across conversation"""
def __init__(self):
self.entities = {}
def extract_entities(self, text: str) -> dict:
"""Extract entities from text"""
prompt = f"""Extract entities from this text:
Text: {text}
Return as JSON:
{{
"people": ["name1", "name2"],
"places": ["place1"],
"organizations": ["org1"],
"dates": ["date1"],
"other": ["thing1"]
}}"""
response = llm.generate(prompt)
return json.loads(response)
def update(self, text: str):
"""Update entity tracking"""
entities = self.extract_entities(text)
for entity_type, items in entities.items():
if entity_type not in self.entities:
self.entities[entity_type] = {}
for item in items:
if item not in self.entities[entity_type]:
self.entities[entity_type][item] = {
"first_seen": time.time(),
"mentions": 0,
"context": []
}
self.entities[entity_type][item]["mentions"] += 1
self.entities[entity_type][item]["context"].append(text)
def get_entity_info(self, entity: str) -> dict:
"""Get information about an entity"""
for entity_type, items in self.entities.items():
if entity in items:
return {
"type": entity_type,
**items[entity]
}
return None
State Management
class StateManager:
"""Manage agent state"""
def __init__(self):
self.state = {
"user_info": {},
"current_task": None,
"preferences": {},
"context": {}
}
def update(self, key: str, value: any):
"""Update state"""
keys = key.split('.')
current = self.state
for k in keys[:-1]:
if k not in current:
current[k] = {}
current = current[k]
current[keys[-1]] = value
def get(self, key: str, default=None):
"""Get state value"""
keys = key.split('.')
current = self.state
for k in keys:
if k not in current:
return default
current = current[k]
return current
def save(self, filepath: str):
"""Save state to file"""
with open(filepath, 'w') as f:
json.dump(self.state, f, indent=2)
def load(self, filepath: str):
"""Load state from file"""
with open(filepath, 'r') as f:
self.state = json.load(f)
Memory Retrieval Strategies
How to find relevant memories efficiently.
Recency-Based Retrieval
def get_recent_memories(memories: List[dict], n: int = 5) -> List[dict]:
"""Get most recent memories"""
sorted_memories = sorted(
memories,
key=lambda x: x.get('timestamp', 0),
reverse=True
)
return sorted_memories[:n]
Relevance-Based Retrieval
def get_relevant_memories(
query: str,
memories: List[dict],
n: int = 5
) -> List[dict]:
"""Get most relevant memories using embeddings"""
query_embedding = get_embedding(query)
scored_memories = []
for memory in memories:
memory_embedding = memory.get('embedding')
if memory_embedding:
score = cosine_similarity(query_embedding, memory_embedding)
scored_memories.append((memory, score))
scored_memories.sort(key=lambda x: x[1], reverse=True)
return [m for m, s in scored_memories[:n]]
Hybrid Retrieval
Combine multiple factors:
def hybrid_retrieval(
query: str,
memories: List[dict],
n: int = 5,
recency_weight: float = 0.3,
relevance_weight: float = 0.7
) -> List[dict]:
"""Combine recency and relevance"""
query_embedding = get_embedding(query)
current_time = time.time()
scored_memories = []
for memory in memories:
# Relevance score
relevance = cosine_similarity(
query_embedding,
memory['embedding']
)
# Recency score (decay over time)
age = current_time - memory['timestamp']
recency = np.exp(-age / (24 * 3600)) # Decay over days
# Combined score
score = (
relevance_weight * relevance +
recency_weight * recency
)
scored_memories.append((memory, score))
scored_memories.sort(key=lambda x: x[1], reverse=True)
return [m for m, s in scored_memories[:n]]
Importance-Based Retrieval
def get_important_memories(
memories: List[dict],
n: int = 5
) -> List[dict]:
"""Get memories marked as important"""
# Score by importance
scored = []
for memory in memories:
importance = memory.get('importance', 0)
scored.append((memory, importance))
scored.sort(key=lambda x: x[1], reverse=True)
return [m for m, s in scored[:n]]
def calculate_importance(memory: dict) -> float:
"""Calculate memory importance"""
prompt = f"""Rate the importance of remembering this information (0-10):
{memory['text']}
Consider:
- Is it about user preferences?
- Is it a key fact?
- Will it be useful later?
Importance (0-10):"""
response = llm.generate(prompt)
return float(response.strip())
Complete Memory System
class ComprehensiveMemory:
"""Full-featured memory system"""
def __init__(self):
# Short-term memory
self.conversation = TokenAwareMemory(max_tokens=4000)
# Long-term memory
self.long_term = ChromaMemory()
# Entity tracking
self.entities = EntityTracker()
# State management
self.state = StateManager()
def add_message(self, role: str, content: str):
"""Add message to conversation"""
message = {
"role": role,
"content": content,
"timestamp": time.time()
}
# Add to short-term
self.conversation.add(message)
# Extract and track entities
if role == "user":
self.entities.update(content)
# Store important messages in long-term
if self.is_important(content):
self.long_term.add(
content,
metadata={
"role": role,
"timestamp": time.time()
}
)
def is_important(self, text: str) -> bool:
"""Determine if message should be stored long-term"""
keywords = [
"my name is", "i prefer", "remember",
"always", "never", "i like", "i don't like"
]
return any(kw in text.lower() for kw in keywords)
def get_context(self, query: str = None) -> List[dict]:
"""Get relevant context for current query"""
context = []
# Add relevant long-term memories
if query:
relevant = self.long_term.search(query, n_results=3)
if relevant:
context.append({
"role": "system",
"content": "Relevant information from past:\n" +
"\n".join([m['text'] for m in relevant])
})
# Add recent conversation
context.extend(self.conversation.get_context())
return context
def save(self, filepath: str):
"""Save memory state"""
data = {
"entities": self.entities.entities,
"state": self.state.state,
"timestamp": time.time()
}
with open(filepath, 'w') as f:
json.dump(data, f, indent=2)
def load(self, filepath: str):
"""Load memory state"""
with open(filepath, 'r') as f:
data = json.load(f)
self.entities.entities = data.get('entities', {})
self.state.state = data.get('state', {})
Using Memory in Agents
class MemoryAgent:
"""Agent with comprehensive memory"""
def __init__(self):
self.memory = ComprehensiveMemory()
self.client = openai.OpenAI()
def chat(self, user_input: str) -> str:
"""Chat with memory"""
# Add user message to memory
self.memory.add_message("user", user_input)
# Get context with relevant memories
context = self.memory.get_context(query=user_input)
# Generate response
response = self.client.chat.completions.create(
model="gpt-4",
messages=context
)
assistant_message = response.choices[0].message.content
# Add assistant response to memory
self.memory.add_message("assistant", assistant_message)
return assistant_message
def save_session(self):
"""Save memory for later"""
self.memory.save("session_memory.json")
def load_session(self):
"""Load previous session"""
self.memory.load("session_memory.json")
Best Practices
- Separate short and long-term: Different storage for different needs
- Be selective: Don’t store everything
- Use semantic search: Find by meaning, not keywords
- Track importance: Prioritize valuable information
- Manage token budgets: Don’t overflow context
- Summarize old conversations: Compress history
- Update entities: Track what’s mentioned
- Persist critical data: Save to disk/database
- Retrieve strategically: Balance recency, relevance, importance
- Test retrieval: Ensure you find what you need
Next Steps
With memory systems in place, agents can maintain context and learn from experience. Next, we’ll explore multi-agent systems where multiple agents collaborate!
Multi-Agent Systems
Why Multiple Agents?
Single agents have limitations. Multiple specialized agents working together can:
- Handle complex tasks requiring diverse expertise
- Work in parallel for faster completion
- Provide checks and balances
- Scale better than monolithic agents
graph LR
subgraph "Single Agent"
S[Task] --> SA[Agent] --> SR[Result]
end
subgraph "Multi-Agent System"
M[Task] --> MA1[Designer]
M --> MA2[Developer]
M --> MA3[Tester]
MA1 --> MC[Coordinator]
MA2 --> MC
MA3 --> MC
MC --> MR[Result]
end
style SA fill:#dbeafe
style MA1 fill:#d1fae5
style MA2 fill:#d1fae5
style MA3 fill:#d1fae5
style MC fill:#fef3c7
Example: Building a website
- Designer Agent: Creates UI/UX mockups
- Developer Agent: Writes code
- Tester Agent: Finds bugs
- Reviewer Agent: Ensures quality
💡 When to Use Multi-Agent Systems
Use multiple agents when:
- Task requires diverse expertise
- Parallel processing is beneficial
- Checks and balances are needed
- Scaling beyond single agent capacity
Stick with single agent when:
- Task is simple and focused
- Coordination overhead isn’t worth it
- Real-time response is critical
Agent Collaboration Patterns
1. Sequential (Pipeline)
Agents work one after another:
Agent A → Agent B → Agent C → Result
class SequentialAgents:
"""Agents work in sequence"""
def __init__(self, agents: List):
self.agents = agents
def run(self, task: str) -> str:
"""Execute agents sequentially"""
result = task
for agent in self.agents:
print(f"→ {agent.name} processing...")
result = agent.process(result)
return result
# Example
pipeline = SequentialAgents([
ResearchAgent(),
AnalysisAgent(),
WriterAgent()
])
result = pipeline.run("Write a report on AI trends")
# Research → Analysis → Writing
2. Parallel (Concurrent)
Agents work simultaneously:
┌─ Agent A ─┐
Task ───┼─ Agent B ─┼─→ Combine → Result
└─ Agent C ─┘
import asyncio
class ParallelAgents:
"""Agents work in parallel"""
def __init__(self, agents: List):
self.agents = agents
async def run(self, task: str) -> str:
"""Execute agents in parallel"""
# Run all agents concurrently
tasks = [agent.process_async(task) for agent in self.agents]
results = await asyncio.gather(*tasks)
# Combine results
return self.combine_results(results)
def combine_results(self, results: List[str]) -> str:
"""Merge results from multiple agents"""
prompt = f"""Combine these results into a coherent response:
{chr(10).join([f"Agent {i+1}: {r}" for i, r in enumerate(results)])}
Combined result:"""
return llm.generate(prompt)
# Example
parallel = ParallelAgents([
SearchAgent(),
DatabaseAgent(),
APIAgent()
])
result = await parallel.run("Find information about user X")
# All agents search simultaneously
3. Hierarchical (Manager-Worker)
Manager delegates to workers:
Manager
/ | \
Worker1 Worker2 Worker3
class ManagerAgent:
"""Manages and delegates to worker agents"""
def __init__(self, workers: List):
self.workers = workers
def run(self, task: str) -> str:
"""Delegate and coordinate"""
# Break down task
subtasks = self.decompose_task(task)
# Assign to workers
assignments = self.assign_tasks(subtasks)
# Collect results
results = []
for worker, subtask in assignments:
result = worker.execute(subtask)
results.append(result)
# Synthesize final result
return self.synthesize(results)
def decompose_task(self, task: str) -> List[str]:
"""Break task into subtasks"""
prompt = f"""Break this task into 3-5 subtasks:
Task: {task}
Subtasks:"""
response = llm.generate(prompt)
return self.parse_subtasks(response)
def assign_tasks(self, subtasks: List[str]) -> List[tuple]:
"""Assign subtasks to workers"""
assignments = []
for i, subtask in enumerate(subtasks):
# Round-robin assignment
worker = self.workers[i % len(self.workers)]
assignments.append((worker, subtask))
return assignments
4. Debate (Adversarial)
Agents debate to reach better conclusions:
class DebateSystem:
"""Agents debate to find best answer"""
def __init__(self, agents: List, rounds: int = 3):
self.agents = agents
self.rounds = rounds
def run(self, question: str) -> str:
"""Run debate"""
positions = []
# Initial positions
for agent in self.agents:
position = agent.initial_position(question)
positions.append(position)
# Debate rounds
for round_num in range(self.rounds):
print(f"\n--- Round {round_num + 1} ---")
new_positions = []
for i, agent in enumerate(self.agents):
# Show other positions
other_positions = [p for j, p in enumerate(positions) if j != i]
# Agent responds
response = agent.respond(question, other_positions)
new_positions.append(response)
print(f"{agent.name}: {response[:100]}...")
positions = new_positions
# Judge decides winner
return self.judge(question, positions)
def judge(self, question: str, positions: List[str]) -> str:
"""Determine best answer"""
prompt = f"""Question: {question}
Positions:
{chr(10).join([f"{i+1}. {p}" for i, p in enumerate(positions)])}
Which position is most convincing and why?"""
return llm.generate(prompt)
5. Collaborative (Peer-to-Peer)
Agents work together as equals:
class CollaborativeAgents:
"""Agents collaborate as peers"""
def __init__(self, agents: List):
self.agents = agents
self.shared_context = {}
def run(self, task: str) -> str:
"""Collaborative execution"""
self.shared_context['task'] = task
self.shared_context['contributions'] = []
# Each agent contributes
for agent in self.agents:
contribution = agent.contribute(self.shared_context)
self.shared_context['contributions'].append({
'agent': agent.name,
'content': contribution
})
# Other agents can see and build on this
print(f"✓ {agent.name} contributed")
# Synthesize all contributions
return self.synthesize_contributions()
def synthesize_contributions(self) -> str:
"""Combine all contributions"""
contributions = self.shared_context['contributions']
prompt = f"""Synthesize these contributions into a final result:
Task: {self.shared_context['task']}
Contributions:
{chr(10).join([f"- {c['agent']}: {c['content']}" for c in contributions])}
Final result:"""
return llm.generate(prompt)
Delegation and Orchestration
Simple Orchestrator
class Orchestrator:
"""Coordinates multiple agents"""
def __init__(self):
self.agents = {}
def register_agent(self, name: str, agent):
"""Register an agent"""
self.agents[name] = agent
def delegate(self, task: str) -> str:
"""Delegate task to appropriate agent"""
# Determine which agent should handle this
agent_name = self.select_agent(task)
if agent_name not in self.agents:
return f"No agent available for: {task}"
# Delegate to agent
agent = self.agents[agent_name]
return agent.execute(task)
def select_agent(self, task: str) -> str:
"""Select best agent for task"""
prompt = f"""Which agent should handle this task?
Task: {task}
Available agents:
{chr(10).join([f"- {name}: {agent.description}" for name, agent in self.agents.items()])}
Best agent:"""
response = llm.generate(prompt)
return response.strip()
Advanced Orchestrator with Routing
class SmartOrchestrator:
"""Intelligent task routing"""
def __init__(self):
self.agents = {}
self.routing_history = []
def register_agent(self, name: str, agent, capabilities: List[str]):
"""Register agent with capabilities"""
self.agents[name] = {
'agent': agent,
'capabilities': capabilities,
'success_rate': 1.0
}
def route_task(self, task: str) -> str:
"""Route task to best agent"""
# Score each agent
scores = {}
for name, info in self.agents.items():
score = self.score_agent(task, info)
scores[name] = score
# Select best agent
best_agent = max(scores, key=scores.get)
# Execute
result = self.agents[best_agent]['agent'].execute(task)
# Update success rate
self.update_success_rate(best_agent, result)
return result
def score_agent(self, task: str, agent_info: dict) -> float:
"""Score agent suitability"""
# Check capability match
capability_score = self.match_capabilities(task, agent_info['capabilities'])
# Consider past success
success_score = agent_info['success_rate']
# Combined score
return 0.7 * capability_score + 0.3 * success_score
Consensus and Voting Mechanisms
Simple Voting
class VotingSystem:
"""Agents vote on decisions"""
def __init__(self, agents: List):
self.agents = agents
def decide(self, question: str, options: List[str]) -> str:
"""Agents vote on options"""
votes = {}
for agent in self.agents:
vote = agent.vote(question, options)
votes[vote] = votes.get(vote, 0) + 1
# Return option with most votes
winner = max(votes, key=votes.get)
return winner
# Example
voters = VotingSystem([
Agent1(), Agent2(), Agent3()
])
decision = voters.decide(
"Which framework should we use?",
["React", "Vue", "Angular"]
)
Weighted Voting
class WeightedVoting:
"""Agents vote with different weights"""
def __init__(self, agents: List[tuple]):
# agents = [(agent, weight), ...]
self.agents = agents
def decide(self, question: str, options: List[str]) -> str:
"""Weighted voting"""
scores = {option: 0.0 for option in options}
for agent, weight in self.agents:
vote = agent.vote(question, options)
scores[vote] += weight
return max(scores, key=scores.get)
# Example
weighted = WeightedVoting([
(ExpertAgent(), 2.0), # Expert has 2x weight
(JuniorAgent(), 1.0),
(JuniorAgent(), 1.0)
])
Consensus Building
class ConsensusBuilder:
"""Build consensus among agents"""
def __init__(self, agents: List, threshold: float = 0.8):
self.agents = agents
self.threshold = threshold
def reach_consensus(self, question: str, max_rounds: int = 5) -> str:
"""Iteratively build consensus"""
for round_num in range(max_rounds):
# Get opinions
opinions = [agent.opinion(question) for agent in self.agents]
# Check agreement
agreement = self.measure_agreement(opinions)
if agreement >= self.threshold:
return self.synthesize_consensus(opinions)
# Share opinions and iterate
for agent in self.agents:
agent.see_opinions(opinions)
return "No consensus reached"
def measure_agreement(self, opinions: List[str]) -> float:
"""Measure how much agents agree"""
# Use embeddings to measure similarity
embeddings = [get_embedding(op) for op in opinions]
# Calculate pairwise similarities
similarities = []
for i in range(len(embeddings)):
for j in range(i+1, len(embeddings)):
sim = cosine_similarity(embeddings[i], embeddings[j])
similarities.append(sim)
return np.mean(similarities)
Communication Protocols
Message Passing
class MessageBus:
"""Central message bus for agent communication"""
def __init__(self):
self.subscribers = {}
self.messages = []
def subscribe(self, agent_id: str, topics: List[str]):
"""Agent subscribes to topics"""
for topic in topics:
if topic not in self.subscribers:
self.subscribers[topic] = []
self.subscribers[topic].append(agent_id)
def publish(self, topic: str, message: dict):
"""Publish message to topic"""
self.messages.append({
'topic': topic,
'message': message,
'timestamp': time.time()
})
# Notify subscribers
if topic in self.subscribers:
for agent_id in self.subscribers[topic]:
self.deliver(agent_id, message)
def deliver(self, agent_id: str, message: dict):
"""Deliver message to agent"""
# Implementation depends on agent architecture
pass
Direct Communication
class Agent:
"""Agent with communication capabilities"""
def __init__(self, name: str):
self.name = name
self.inbox = []
self.peers = {}
def send_message(self, recipient: str, message: str):
"""Send message to another agent"""
if recipient in self.peers:
self.peers[recipient].receive_message(self.name, message)
def receive_message(self, sender: str, message: str):
"""Receive message from another agent"""
self.inbox.append({
'from': sender,
'message': message,
'timestamp': time.time()
})
def broadcast(self, message: str):
"""Send message to all peers"""
for peer_name, peer in self.peers.items():
peer.receive_message(self.name, message)
def add_peer(self, name: str, agent):
"""Add peer agent"""
self.peers[name] = agent
Complete Multi-Agent System
class MultiAgentSystem:
"""Complete multi-agent system"""
def __init__(self):
self.agents = {}
self.message_bus = MessageBus()
self.orchestrator = Orchestrator()
def add_agent(self, name: str, agent, role: str):
"""Add agent to system"""
self.agents[name] = {
'agent': agent,
'role': role,
'status': 'idle'
}
self.orchestrator.register_agent(name, agent)
def execute_task(self, task: str, strategy: str = 'auto') -> str:
"""Execute task using appropriate strategy"""
if strategy == 'sequential':
return self.execute_sequential(task)
elif strategy == 'parallel':
return self.execute_parallel(task)
elif strategy == 'hierarchical':
return self.execute_hierarchical(task)
else:
return self.execute_auto(task)
def execute_sequential(self, task: str) -> str:
"""Sequential execution"""
result = task
for name, info in self.agents.items():
agent = info['agent']
result = agent.process(result)
return result
async def execute_parallel(self, task: str) -> str:
"""Parallel execution"""
tasks = []
for name, info in self.agents.items():
agent = info['agent']
tasks.append(agent.process_async(task))
results = await asyncio.gather(*tasks)
return self.combine_results(results)
def execute_hierarchical(self, task: str) -> str:
"""Hierarchical execution with manager"""
# Find manager agent
manager = self.find_manager()
if not manager:
return "No manager agent available"
# Manager coordinates workers
return manager.coordinate(task, self.agents)
def execute_auto(self, task: str) -> str:
"""Automatically choose best strategy"""
# Analyze task complexity
complexity = self.analyze_task(task)
if complexity['parallel_potential'] > 0.7:
return asyncio.run(self.execute_parallel(task))
elif complexity['requires_coordination']:
return self.execute_hierarchical(task)
else:
return self.execute_sequential(task)
Example: Research Team
class ResearchTeam:
"""Multi-agent research team"""
def __init__(self):
self.researcher = ResearchAgent()
self.analyst = AnalystAgent()
self.writer = WriterAgent()
self.reviewer = ReviewerAgent()
def research_topic(self, topic: str) -> str:
"""Collaborative research"""
# 1. Researcher gathers information
print("📚 Researcher gathering information...")
raw_data = self.researcher.gather(topic)
# 2. Analyst analyzes data
print("📊 Analyst analyzing data...")
analysis = self.analyst.analyze(raw_data)
# 3. Writer creates report
print("✍️ Writer creating report...")
draft = self.writer.write(analysis)
# 4. Reviewer provides feedback
print("👀 Reviewer checking quality...")
feedback = self.reviewer.review(draft)
# 5. Writer revises based on feedback
if feedback['needs_revision']:
print("🔄 Writer revising...")
final = self.writer.revise(draft, feedback)
else:
final = draft
return final
# Usage
team = ResearchTeam()
report = team.research_topic("AI Agent Architectures")
Best Practices
- Clear roles: Each agent should have a specific purpose
- Defined interfaces: Standardize communication
- Avoid bottlenecks: Don’t make everything go through one agent
- Handle failures: One agent failing shouldn’t crash the system
- Monitor coordination: Track how agents interact
- Balance autonomy: Agents should be independent but coordinated
- Prevent conflicts: Resolve disagreements systematically
- Scale gradually: Start simple, add complexity as needed
- Test interactions: Verify agents work well together
- Document protocols: Clear communication standards
Common Pitfalls
Pitfall 1: Over-coordination
Problem: Too much communication overhead Solution: Let agents work independently when possible
Pitfall 2: Conflicting Goals
Problem: Agents work against each other Solution: Align objectives and add conflict resolution
Pitfall 3: Infinite Loops
Problem: Agents keep delegating to each other Solution: Add delegation limits and cycle detection
Pitfall 4: No Clear Owner
Problem: Task falls through the cracks Solution: Always assign clear responsibility
Practice Exercises
Exercise 1: Build a Debate System (Medium)
Task: Create 3 agents that debate a topic and reach consensus.
Requirements:
- Each agent takes a position
- Agents respond to each other’s arguments
- Judge determines the winner
Click to see solution
class DebateAgent:
def __init__(self, position: str):
self.position = position
self.client = openai.OpenAI()
def argue(self, topic: str, opponent_args: List[str]) -> str:
prompt = f"Topic: {topic}\nYour position: {self.position}\nOpponent arguments: {opponent_args}\n\nYour argument:"
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}]
)
return response.choices[0].message.content
# Create debate
agents = [
DebateAgent("for"),
DebateAgent("against"),
DebateAgent("neutral")
]
# Run debate rounds
for round in range(3):
for agent in agents:
others = [a.argue(topic, []) for a in agents if a != agent]
agent.argue(topic, others)
Exercise 2: Parallel Task Execution (Hard)
Task: Create a system where 4 agents analyze different files simultaneously.
Requirements:
- Use asyncio for parallel execution
- Aggregate results
- Handle failures gracefully
Click to see solution
import asyncio
async def analyze_parallel(files: List[str]) -> List[Dict]:
tasks = [analyze_file(f) for f in files]
results = await asyncio.gather(*tasks, return_exceptions=True)
return [r for r in results if not isinstance(r, Exception)]
async def analyze_file(file_path: str) -> Dict:
# Simulate analysis
await asyncio.sleep(1)
return {"file": file_path, "issues": []}
✅ Chapter 3 Summary
You’ve mastered advanced agent patterns:
- Planning: Create multi-step plans with Chain-of-Thought and task decomposition
- Memory: Implement short-term, long-term, and semantic memory systems
- Multi-Agent: Coordinate specialized agents with various collaboration patterns
These patterns enable agents to handle complex, long-running tasks that require coordination, context, and diverse expertise.
Next Steps
You now understand multi-agent systems! In Chapter 4, we’ll explore the tools and capabilities that make agents powerful, including code execution, data access, and web interaction.
Code Execution
Module 4: Learning Objectives
By the end of this module, you will:
- ✓ Execute code safely in sandboxed environments
- ✓ Integrate data sources (databases, APIs, file systems)
- ✓ Implement web scraping and browser automation
- ✓ Build RAG systems for knowledge retrieval
- ✓ Handle various data formats and protocols
Why Agents Need Code Execution
Code execution allows agents to:
- Perform precise calculations
- Process data programmatically
- Generate and test code
- Automate complex operations
- Verify results deterministically
Without code execution: “The sum of 1 to 100 is approximately 5050” With code execution: “The sum of 1 to 100 is exactly 5050” (calculated)
Sandboxed Environments
Never execute untrusted code directly. Always use sandboxing.
Why Sandboxing?
Risks of unsandboxed execution:
- File system access (delete files)
- Network access (data exfiltration)
- System commands (malicious operations)
- Resource exhaustion (infinite loops)
Docker Sandbox
import docker
import tempfile
class DockerSandbox:
"""Execute code in Docker container"""
def __init__(self, image="python:3.11-slim"):
self.client = docker.from_env()
self.image = image
def execute(self, code: str, timeout: int = 30) -> dict:
"""Execute Python code in container"""
try:
# Create temporary file with code
with tempfile.NamedTemporaryFile(mode='w', suffix='.py', delete=False) as f:
f.write(code)
code_file = f.name
# Run container
container = self.client.containers.run(
self.image,
f"python {code_file}",
detach=True,
mem_limit="128m",
network_disabled=True,
remove=True
)
# Wait for completion
result = container.wait(timeout=timeout)
logs = container.logs().decode('utf-8')
return {
"success": result['StatusCode'] == 0,
"output": logs,
"exit_code": result['StatusCode']
}
except docker.errors.ContainerError as e:
return {
"success": False,
"output": str(e),
"exit_code": -1
}
except Exception as e:
return {
"success": False,
"output": f"Error: {str(e)}",
"exit_code": -1
}
RestrictedPython
from RestrictedPython import compile_restricted, safe_globals
import io
import sys
class RestrictedExecutor:
"""Execute Python with restrictions"""
def __init__(self):
self.safe_builtins = {
'print': print,
'range': range,
'len': len,
'sum': sum,
'max': max,
'min': min,
'abs': abs,
'round': round,
'sorted': sorted,
'list': list,
'dict': dict,
'set': set,
'str': str,
'int': int,
'float': float,
}
def execute(self, code: str, timeout: int = 5) -> dict:
"""Execute restricted Python code"""
try:
# Compile with restrictions
byte_code = compile_restricted(
code,
filename='<inline>',
mode='exec'
)
if byte_code.errors:
return {
"success": False,
"output": "\n".join(byte_code.errors)
}
# Capture output
output_buffer = io.StringIO()
sys.stdout = output_buffer
# Execute with safe globals
exec(byte_code, {
"__builtins__": self.safe_builtins,
"_print_": print,
"_getattr_": getattr,
})
# Restore stdout
sys.stdout = sys.__stdout__
return {
"success": True,
"output": output_buffer.getvalue()
}
except Exception as e:
sys.stdout = sys.__stdout__
return {
"success": False,
"output": f"Error: {str(e)}"
}
E2B Code Interpreter
from e2b import Sandbox
class E2BSandbox:
"""Execute code using E2B"""
def __init__(self):
self.sandbox = Sandbox()
def execute_python(self, code: str) -> dict:
"""Execute Python code"""
try:
execution = self.sandbox.run_code(code)
return {
"success": not execution.error,
"output": execution.stdout,
"error": execution.stderr,
"logs": execution.logs
}
except Exception as e:
return {
"success": False,
"output": "",
"error": str(e)
}
def execute_bash(self, command: str) -> dict:
"""Execute bash command"""
try:
result = self.sandbox.process.start_and_wait(command)
return {
"success": result.exit_code == 0,
"output": result.stdout,
"error": result.stderr
}
except Exception as e:
return {
"success": False,
"error": str(e)
}
Code Generation and Validation
Generate Code
def generate_code(task: str, language: str = "python") -> str:
"""Generate code for a task"""
prompt = f"""Write {language} code to accomplish this task:
Task: {task}
Requirements:
- Include error handling
- Add comments
- Return result clearly
- Keep it simple and readable
Code:"""
response = llm.generate(prompt, temperature=0.2)
return extract_code(response)
def extract_code(response: str) -> str:
"""Extract code from markdown"""
import re
# Look for code blocks
pattern = r"```(?:python)?\n(.*?)```"
matches = re.findall(pattern, response, re.DOTALL)
if matches:
return matches[0].strip()
return response.strip()
Validate Code
import ast
def validate_python_code(code: str) -> dict:
"""Validate Python code syntax"""
try:
ast.parse(code)
return {
"valid": True,
"errors": []
}
except SyntaxError as e:
return {
"valid": False,
"errors": [f"Line {e.lineno}: {e.msg}"]
}
def check_dangerous_operations(code: str) -> dict:
"""Check for dangerous operations"""
dangerous_patterns = [
(r'import\s+os', "OS module import"),
(r'import\s+sys', "System module import"),
(r'import\s+subprocess', "Subprocess import"),
(r'open\s*\(', "File operations"),
(r'eval\s*\(', "Eval usage"),
(r'exec\s*\(', "Exec usage"),
(r'__import__', "Dynamic imports"),
]
issues = []
for pattern, description in dangerous_patterns:
if re.search(pattern, code):
issues.append(description)
return {
"safe": len(issues) == 0,
"issues": issues
}
Test Generated Code
def test_code(code: str, test_cases: List[dict]) -> dict:
"""Test code with test cases"""
sandbox = RestrictedExecutor()
results = []
for test in test_cases:
# Prepare test code
test_code = f"""
{code}
# Test case
result = {test['call']}
print(result)
"""
# Execute
output = sandbox.execute(test_code)
# Check result
expected = str(test['expected'])
actual = output['output'].strip()
results.append({
"test": test['call'],
"expected": expected,
"actual": actual,
"passed": actual == expected
})
return {
"total": len(results),
"passed": sum(1 for r in results if r['passed']),
"results": results
}
# Example usage
code = """
def add(a, b):
return a + b
"""
test_cases = [
{"call": "add(2, 3)", "expected": 5},
{"call": "add(-1, 1)", "expected": 0},
{"call": "add(0, 0)", "expected": 0}
]
results = test_code(code, test_cases)
Debugging and Error Recovery
Parse Errors
def parse_error(error_message: str) -> dict:
"""Parse error message for useful info"""
import re
# Extract line number
line_match = re.search(r'line (\d+)', error_message)
line_num = int(line_match.group(1)) if line_match else None
# Extract error type
type_match = re.search(r'(\w+Error):', error_message)
error_type = type_match.group(1) if type_match else "Unknown"
return {
"type": error_type,
"line": line_num,
"message": error_message
}
Auto-Fix Errors
def fix_code_error(code: str, error: str) -> str:
"""Attempt to fix code based on error"""
prompt = f"""This code has an error:
Code:
```python
{code}
Error: {error}
Provide the corrected code:“”“
response = llm.generate(prompt, temperature=0.1)
return extract_code(response)
def iterative_fix(code: str, max_attempts: int = 3) -> dict: “”“Iteratively fix code until it works”“” sandbox = RestrictedExecutor()
for attempt in range(max_attempts):
# Try to execute
result = sandbox.execute(code)
if result['success']:
return {
"success": True,
"code": code,
"attempts": attempt + 1
}
# Try to fix
code = fix_code_error(code, result['output'])
return {
"success": False,
"code": code,
"attempts": max_attempts,
"error": "Max attempts reached"
}
## Security Considerations
### Input Validation
```python
def validate_code_input(code: str) -> dict:
"""Validate code before execution"""
# Check length
if len(code) > 10000:
return {
"valid": False,
"reason": "Code too long (max 10000 chars)"
}
# Check for null bytes
if '\x00' in code:
return {
"valid": False,
"reason": "Invalid characters in code"
}
# Check syntax
syntax_check = validate_python_code(code)
if not syntax_check['valid']:
return {
"valid": False,
"reason": f"Syntax error: {syntax_check['errors']}"
}
# Check for dangerous operations
safety_check = check_dangerous_operations(code)
if not safety_check['safe']:
return {
"valid": False,
"reason": f"Unsafe operations: {safety_check['issues']}"
}
return {"valid": True}
Resource Limits
class ResourceLimitedExecutor:
"""Execute code with resource limits"""
def __init__(self):
self.max_execution_time = 30 # seconds
self.max_memory = 128 * 1024 * 1024 # 128 MB
self.max_output_size = 10000 # characters
def execute(self, code: str) -> dict:
"""Execute with limits"""
import signal
import resource
def timeout_handler(signum, frame):
raise TimeoutError("Execution timeout")
# Set timeout
signal.signal(signal.SIGALRM, timeout_handler)
signal.alarm(self.max_execution_time)
# Set memory limit
resource.setrlimit(
resource.RLIMIT_AS,
(self.max_memory, self.max_memory)
)
try:
# Execute code
result = self._execute_code(code)
# Limit output size
if len(result['output']) > self.max_output_size:
result['output'] = result['output'][:self.max_output_size] + "...(truncated)"
return result
except TimeoutError:
return {
"success": False,
"output": "Execution timeout"
}
except MemoryError:
return {
"success": False,
"output": "Memory limit exceeded"
}
finally:
signal.alarm(0) # Cancel alarm
Complete Code Execution Agent
class CodeExecutionAgent:
"""Agent that can generate and execute code"""
def __init__(self):
self.sandbox = RestrictedExecutor()
self.client = openai.OpenAI()
def solve_with_code(self, problem: str) -> str:
"""Solve problem by generating and executing code"""
# Generate code
print("💻 Generating code...")
code = self.generate_solution(problem)
print(f"Generated:\n{code}\n")
# Validate
validation = validate_code_input(code)
if not validation['valid']:
return f"Invalid code: {validation['reason']}"
# Execute
print("▶️ Executing code...")
result = self.sandbox.execute(code)
if result['success']:
print(f"✓ Output: {result['output']}\n")
return self.format_result(problem, code, result['output'])
else:
# Try to fix and retry
print("⚠️ Error occurred, attempting fix...")
fixed = iterative_fix(code)
if fixed['success']:
result = self.sandbox.execute(fixed['code'])
return self.format_result(problem, fixed['code'], result['output'])
else:
return f"Failed to execute: {result['output']}"
def generate_solution(self, problem: str) -> str:
"""Generate code to solve problem"""
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{
"role": "user",
"content": f"""Write Python code to solve this problem:
{problem}
Requirements:
- Use only standard library
- Print the final result
- Handle edge cases
- Keep it simple
Provide only the code, no explanations."""
}],
temperature=0.2
)
return extract_code(response.choices[0].message.content)
def format_result(self, problem: str, code: str, output: str) -> str:
"""Format final result"""
return f"""Problem: {problem}
Solution:
```python
{code}
Result: {output}“”“
Usage
agent = CodeExecutionAgent() result = agent.solve_with_code(“Calculate the sum of all prime numbers less than 100”) print(result)
## Advanced Use Cases
### Data Analysis
```python
def analyze_data_with_code(data: List[dict], question: str) -> str:
"""Analyze data using generated code"""
# Generate analysis code
code = f"""
import json
data = {json.dumps(data)}
# Analysis code will be generated here
"""
analysis_code = generate_code(
f"Analyze this data to answer: {question}\nData structure: {data[0] if data else {}}"
)
full_code = code + "\n" + analysis_code
# Execute
sandbox = RestrictedExecutor()
result = sandbox.execute(full_code)
return result['output']
Mathematical Computation
def compute_math(expression: str) -> str:
"""Safely compute mathematical expression"""
code = f"""
import math
result = {expression}
print(result)
"""
sandbox = RestrictedExecutor()
result = sandbox.execute(code)
if result['success']:
return result['output'].strip()
else:
return f"Error: {result['output']}"
Code Transformation
def transform_code(code: str, transformation: str) -> str:
"""Transform code (refactor, optimize, etc.)"""
prompt = f"""Transform this code:
Original:
```python
{code}
Transformation: {transformation}
Transformed code:“”“
response = llm.generate(prompt)
return extract_code(response)
Example
original = “for i in range(len(items)): print(items[i])” transformed = transform_code(original, “Make it more Pythonic”)
Result: “for item in items: print(item)”
## Best Practices
1. **Always sandbox**: Never execute untrusted code directly
2. **Set timeouts**: Prevent infinite loops
3. **Limit resources**: Memory, CPU, network
4. **Validate inputs**: Check code before execution
5. **Handle errors gracefully**: Don't crash on bad code
6. **Test generated code**: Verify it works
7. **Log executions**: Track what code runs
8. **Isolate environments**: One execution shouldn't affect others
9. **Clean up**: Remove temporary files and containers
10. **Monitor usage**: Track resource consumption
## Common Pitfalls
### Pitfall 1: Trusting Generated Code
**Problem**: LLM generates code with bugs
**Solution**: Always test and validate
### Pitfall 2: No Timeout
**Problem**: Infinite loops hang the system
**Solution**: Set execution timeouts
### Pitfall 3: Unrestricted Access
**Problem**: Code can access file system
**Solution**: Use proper sandboxing
### Pitfall 4: Poor Error Messages
**Problem**: User doesn't understand what went wrong
**Solution**: Parse and explain errors clearly
## Next Steps
You now understand code execution for agents! Next, we'll explore data access and retrieval, including databases, APIs, and RAG systems.
Data Access & Retrieval
RAG (Retrieval Augmented Generation)
RAG combines retrieval with generation to provide accurate, grounded responses.
Why RAG?
Without RAG:
- LLM relies on training data (may be outdated)
- Can hallucinate facts
- No access to private/recent information
With RAG:
- Retrieves relevant documents first
- Grounds responses in actual data
- Works with private knowledge bases
- Always up-to-date
Basic RAG Pipeline
class SimpleRAG:
"""Basic RAG implementation"""
def __init__(self):
self.documents = []
self.embeddings = []
self.client = openai.OpenAI()
def add_document(self, text: str, metadata: dict = None):
"""Add document to knowledge base"""
# Create embedding
embedding = self.get_embedding(text)
self.documents.append({
"text": text,
"metadata": metadata or {},
"id": len(self.documents)
})
self.embeddings.append(embedding)
def get_embedding(self, text: str) -> list:
"""Get embedding for text"""
response = self.client.embeddings.create(
model="text-embedding-3-small",
input=text
)
return response.data[0].embedding
def retrieve(self, query: str, top_k: int = 3) -> list:
"""Retrieve relevant documents"""
# Get query embedding
query_embedding = self.get_embedding(query)
# Calculate similarities
similarities = []
for i, doc_embedding in enumerate(self.embeddings):
similarity = self.cosine_similarity(query_embedding, doc_embedding)
similarities.append((i, similarity))
# Sort and get top k
similarities.sort(key=lambda x: x[1], reverse=True)
results = []
for i, score in similarities[:top_k]:
doc = self.documents[i].copy()
doc['score'] = score
results.append(doc)
return results
def query(self, question: str) -> str:
"""Answer question using RAG"""
# Retrieve relevant documents
docs = self.retrieve(question, top_k=3)
# Build context
context = "\n\n".join([
f"Document {i+1}:\n{doc['text']}"
for i, doc in enumerate(docs)
])
# Generate answer
response = self.client.chat.completions.create(
model="gpt-4",
messages=[
{
"role": "system",
"content": "Answer questions based on the provided context. If the answer isn't in the context, say so."
},
{
"role": "user",
"content": f"Context:\n{context}\n\nQuestion: {question}"
}
]
)
return response.choices[0].message.content
def cosine_similarity(self, a: list, b: list) -> float:
"""Calculate cosine similarity"""
import numpy as np
a = np.array(a)
b = np.array(b)
return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
# Usage
rag = SimpleRAG()
# Add documents
rag.add_document("Python is a high-level programming language.")
rag.add_document("JavaScript is used for web development.")
rag.add_document("Python is popular for data science and AI.")
# Query
answer = rag.query("What is Python used for?")
print(answer)
Advanced RAG with LangChain
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains import RetrievalQA
from langchain.llms import OpenAI
class AdvancedRAG:
"""RAG using LangChain"""
def __init__(self, persist_directory="./chroma_db"):
self.embeddings = OpenAIEmbeddings()
self.text_splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=200
)
self.vectorstore = None
self.persist_directory = persist_directory
def load_documents(self, documents: list):
"""Load and process documents"""
# Split documents into chunks
chunks = self.text_splitter.create_documents(documents)
# Create vector store
self.vectorstore = Chroma.from_documents(
documents=chunks,
embedding=self.embeddings,
persist_directory=self.persist_directory
)
def query(self, question: str) -> dict:
"""Query with source attribution"""
if not self.vectorstore:
return {"answer": "No documents loaded", "sources": []}
# Create QA chain
qa_chain = RetrievalQA.from_chain_type(
llm=OpenAI(temperature=0),
chain_type="stuff",
retriever=self.vectorstore.as_retriever(search_kwargs={"k": 3}),
return_source_documents=True
)
# Query
result = qa_chain({"query": question})
return {
"answer": result["result"],
"sources": [doc.page_content for doc in result["source_documents"]]
}
Chunking Strategies
class DocumentChunker:
"""Different chunking strategies"""
def chunk_by_tokens(self, text: str, chunk_size: int = 512, overlap: int = 50) -> list:
"""Chunk by token count"""
import tiktoken
encoding = tiktoken.encoding_for_model("gpt-4")
tokens = encoding.encode(text)
chunks = []
start = 0
while start < len(tokens):
end = start + chunk_size
chunk_tokens = tokens[start:end]
chunk_text = encoding.decode(chunk_tokens)
chunks.append(chunk_text)
start = end - overlap
return chunks
def chunk_by_sentences(self, text: str, sentences_per_chunk: int = 5) -> list:
"""Chunk by sentences"""
import re
# Split into sentences
sentences = re.split(r'[.!?]+', text)
sentences = [s.strip() for s in sentences if s.strip()]
chunks = []
for i in range(0, len(sentences), sentences_per_chunk):
chunk = ". ".join(sentences[i:i+sentences_per_chunk]) + "."
chunks.append(chunk)
return chunks
def chunk_by_paragraphs(self, text: str) -> list:
"""Chunk by paragraphs"""
paragraphs = text.split('\n\n')
return [p.strip() for p in paragraphs if p.strip()]
def semantic_chunking(self, text: str, similarity_threshold: float = 0.7) -> list:
"""Chunk based on semantic similarity"""
sentences = self.split_sentences(text)
if not sentences:
return []
chunks = []
current_chunk = [sentences[0]]
for i in range(1, len(sentences)):
# Check similarity with current chunk
chunk_text = " ".join(current_chunk)
similarity = self.calculate_similarity(chunk_text, sentences[i])
if similarity >= similarity_threshold:
current_chunk.append(sentences[i])
else:
# Start new chunk
chunks.append(" ".join(current_chunk))
current_chunk = [sentences[i]]
# Add last chunk
if current_chunk:
chunks.append(" ".join(current_chunk))
return chunks
Database Queries
SQL Databases
import sqlite3
from typing import List, Dict
class SQLAgent:
"""Agent that can query SQL databases"""
def __init__(self, db_path: str):
self.db_path = db_path
self.client = openai.OpenAI()
def get_schema(self) -> str:
"""Get database schema"""
conn = sqlite3.connect(self.db_path)
cursor = conn.cursor()
# Get all tables
cursor.execute("SELECT name FROM sqlite_master WHERE type='table'")
tables = cursor.fetchall()
schema = []
for table in tables:
table_name = table[0]
cursor.execute(f"PRAGMA table_info({table_name})")
columns = cursor.fetchall()
schema.append(f"Table: {table_name}")
for col in columns:
schema.append(f" - {col[1]} ({col[2]})")
conn.close()
return "\n".join(schema)
def natural_language_query(self, question: str) -> Dict:
"""Convert natural language to SQL and execute"""
# Generate SQL
sql = self.generate_sql(question)
# Execute SQL
results = self.execute_sql(sql)
# Format response
answer = self.format_results(question, results)
return {
"question": question,
"sql": sql,
"results": results,
"answer": answer
}
def generate_sql(self, question: str) -> str:
"""Generate SQL from natural language"""
schema = self.get_schema()
response = self.client.chat.completions.create(
model="gpt-4",
messages=[
{
"role": "system",
"content": f"""You are a SQL expert. Convert natural language questions to SQL queries.
Database schema:
{schema}
Rules:
- Return only the SQL query, no explanations
- Use proper SQL syntax
- Be careful with column names
- Use appropriate JOINs when needed"""
},
{
"role": "user",
"content": question
}
],
temperature=0.1
)
sql = response.choices[0].message.content.strip()
# Remove markdown code blocks if present
sql = sql.replace("```sql", "").replace("```", "").strip()
return sql
def execute_sql(self, sql: str) -> List[Dict]:
"""Execute SQL query safely"""
# Validate query (read-only)
if not self.is_safe_query(sql):
raise ValueError("Only SELECT queries are allowed")
conn = sqlite3.connect(self.db_path)
conn.row_factory = sqlite3.Row
cursor = conn.cursor()
try:
cursor.execute(sql)
rows = cursor.fetchall()
# Convert to list of dicts
results = [dict(row) for row in rows]
conn.close()
return results
except Exception as e:
conn.close()
raise Exception(f"SQL execution error: {str(e)}")
def is_safe_query(self, sql: str) -> bool:
"""Check if query is safe (read-only)"""
sql_upper = sql.upper().strip()
# Only allow SELECT
if not sql_upper.startswith("SELECT"):
return False
# Disallow dangerous keywords
dangerous = ["DROP", "DELETE", "INSERT", "UPDATE", "ALTER", "CREATE"]
for keyword in dangerous:
if keyword in sql_upper:
return False
return True
def format_results(self, question: str, results: List[Dict]) -> str:
"""Format results as natural language"""
if not results:
return "No results found."
# Convert results to text
results_text = "\n".join([str(row) for row in results[:10]])
response = self.client.chat.completions.create(
model="gpt-4",
messages=[
{
"role": "user",
"content": f"""Answer this question based on the query results:
Question: {question}
Results:
{results_text}
Provide a clear, natural language answer:"""
}
]
)
return response.choices[0].message.content
# Usage
agent = SQLAgent("company.db")
result = agent.natural_language_query("How many employees are in the sales department?")
print(result['answer'])
NoSQL Databases
from pymongo import MongoClient
class MongoDBAgent:
"""Agent for MongoDB queries"""
def __init__(self, connection_string: str, database: str):
self.client = MongoClient(connection_string)
self.db = self.client[database]
self.llm = openai.OpenAI()
def query(self, question: str, collection: str) -> dict:
"""Query MongoDB using natural language"""
# Generate MongoDB query
query_dict = self.generate_query(question, collection)
# Execute query
results = list(self.db[collection].find(query_dict).limit(10))
# Format response
answer = self.format_results(question, results)
return {
"question": question,
"query": query_dict,
"results": results,
"answer": answer
}
def generate_query(self, question: str, collection: str) -> dict:
"""Generate MongoDB query from natural language"""
# Get sample document
sample = self.db[collection].find_one()
response = self.llm.chat.completions.create(
model="gpt-4",
messages=[
{
"role": "system",
"content": f"""Convert natural language to MongoDB query.
Collection: {collection}
Sample document: {sample}
Return only valid JSON for MongoDB find() query."""
},
{
"role": "user",
"content": question
}
],
temperature=0.1
)
import json
query_str = response.choices[0].message.content.strip()
return json.loads(query_str)
API Integrations
REST API Client
import requests
from typing import Optional
class APIAgent:
"""Agent that can call REST APIs"""
def __init__(self):
self.client = openai.OpenAI()
self.session = requests.Session()
def call_api(self,
url: str,
method: str = "GET",
headers: Optional[dict] = None,
params: Optional[dict] = None,
data: Optional[dict] = None) -> dict:
"""Make API call"""
try:
response = self.session.request(
method=method,
url=url,
headers=headers,
params=params,
json=data,
timeout=30
)
response.raise_for_status()
return {
"success": True,
"status_code": response.status_code,
"data": response.json() if response.content else None
}
except requests.exceptions.RequestException as e:
return {
"success": False,
"error": str(e)
}
def natural_language_api_call(self, request: str, api_spec: dict) -> dict:
"""Convert natural language to API call"""
# Generate API call parameters
params = self.generate_api_params(request, api_spec)
# Make API call
result = self.call_api(**params)
# Format response
if result['success']:
answer = self.format_api_response(request, result['data'])
return {
"request": request,
"api_call": params,
"response": result['data'],
"answer": answer
}
else:
return {
"request": request,
"error": result['error']
}
def generate_api_params(self, request: str, api_spec: dict) -> dict:
"""Generate API parameters from natural language"""
response = self.client.chat.completions.create(
model="gpt-4",
messages=[
{
"role": "system",
"content": f"""Convert natural language to API call parameters.
API Specification:
{json.dumps(api_spec, indent=2)}
Return JSON with: url, method, headers, params, data"""
},
{
"role": "user",
"content": request
}
],
temperature=0.1
)
import json
return json.loads(response.choices[0].message.content)
GraphQL Client
from gql import gql, Client
from gql.transport.requests import RequestsHTTPTransport
class GraphQLAgent:
"""Agent for GraphQL APIs"""
def __init__(self, endpoint: str):
transport = RequestsHTTPTransport(url=endpoint)
self.client = Client(transport=transport, fetch_schema_from_transport=True)
self.llm = openai.OpenAI()
def query(self, natural_language_query: str) -> dict:
"""Execute GraphQL query from natural language"""
# Generate GraphQL query
graphql_query = self.generate_graphql(natural_language_query)
# Execute query
query = gql(graphql_query)
result = self.client.execute(query)
# Format response
answer = self.format_results(natural_language_query, result)
return {
"question": natural_language_query,
"graphql": graphql_query,
"result": result,
"answer": answer
}
def generate_graphql(self, question: str) -> str:
"""Generate GraphQL query"""
schema = self.client.schema
response = self.llm.chat.completions.create(
model="gpt-4",
messages=[
{
"role": "system",
"content": f"""Generate GraphQL query from natural language.
Schema: {schema}
Return only the GraphQL query."""
},
{
"role": "user",
"content": question
}
]
)
return response.choices[0].message.content.strip()
File System Operations
Safe File Access
import os
from pathlib import Path
class FileSystemAgent:
"""Agent with safe file system access"""
def __init__(self, allowed_directory: str):
self.allowed_directory = Path(allowed_directory).resolve()
def is_safe_path(self, path: str) -> bool:
"""Check if path is within allowed directory"""
try:
requested_path = (self.allowed_directory / path).resolve()
return requested_path.is_relative_to(self.allowed_directory)
except:
return False
def read_file(self, path: str) -> dict:
"""Read file safely"""
if not self.is_safe_path(path):
return {"success": False, "error": "Access denied"}
try:
full_path = self.allowed_directory / path
with open(full_path, 'r') as f:
content = f.read()
return {
"success": True,
"content": content,
"size": len(content)
}
except Exception as e:
return {
"success": False,
"error": str(e)
}
def list_files(self, path: str = ".") -> dict:
"""List files in directory"""
if not self.is_safe_path(path):
return {"success": False, "error": "Access denied"}
try:
full_path = self.allowed_directory / path
files = []
for item in full_path.iterdir():
files.append({
"name": item.name,
"type": "directory" if item.is_dir() else "file",
"size": item.stat().st_size if item.is_file() else None
})
return {
"success": True,
"files": files
}
except Exception as e:
return {
"success": False,
"error": str(e)
}
def search_files(self, pattern: str, path: str = ".") -> dict:
"""Search for files matching pattern"""
if not self.is_safe_path(path):
return {"success": False, "error": "Access denied"}
try:
full_path = self.allowed_directory / path
matches = list(full_path.rglob(pattern))
results = [
{
"path": str(m.relative_to(self.allowed_directory)),
"name": m.name,
"size": m.stat().st_size if m.is_file() else None
}
for m in matches
]
return {
"success": True,
"matches": results
}
except Exception as e:
return {
"success": False,
"error": str(e)
}
Complete Data Access Agent
class DataAccessAgent:
"""Unified agent for data access"""
def __init__(self):
self.rag = SimpleRAG()
self.sql_agent = None
self.api_agent = APIAgent()
self.fs_agent = None
self.client = openai.OpenAI()
def configure_sql(self, db_path: str):
"""Configure SQL access"""
self.sql_agent = SQLAgent(db_path)
def configure_filesystem(self, allowed_dir: str):
"""Configure file system access"""
self.fs_agent = FileSystemAgent(allowed_dir)
def query(self, question: str) -> str:
"""Answer question using appropriate data source"""
# Determine which data source to use
source = self.determine_source(question)
if source == "rag":
return self.rag.query(question)
elif source == "sql" and self.sql_agent:
result = self.sql_agent.natural_language_query(question)
return result['answer']
elif source == "api":
# Would need API spec
return "API access requires configuration"
elif source == "filesystem" and self.fs_agent:
# Would need to determine file operation
return "File system access requires specific operation"
else:
return "Unable to determine appropriate data source"
def determine_source(self, question: str) -> str:
"""Determine which data source to use"""
response = self.client.chat.completions.create(
model="gpt-4",
messages=[
{
"role": "user",
"content": f"""Which data source should be used for this question?
Question: {question}
Options: rag, sql, api, filesystem
Answer with just the option:"""
}
],
temperature=0.1
)
return response.choices[0].message.content.strip().lower()
Best Practices
- Validate queries: Check SQL/API calls before execution
- Limit results: Don’t return huge datasets
- Cache responses: Avoid redundant queries
- Handle errors: Graceful failure handling
- Secure credentials: Never expose API keys
- Rate limiting: Respect API limits
- Chunk large documents: Better retrieval
- Use appropriate embeddings: Match your use case
- Monitor costs: Track API usage
- Test thoroughly: Verify data access works
Next Steps
You now understand data access and retrieval! Next, we’ll explore web interaction including browser automation and scraping.
Web Interaction
Browser Automation
Agents can interact with websites like humans do—clicking, typing, scrolling, and extracting information.
Why Browser Automation?
- Access dynamic content (JavaScript-rendered)
- Interact with web applications
- Fill forms and submit data
- Navigate multi-page workflows
- Handle authentication
Playwright Basics
from playwright.sync_api import sync_playwright
from typing import Optional
class BrowserAgent:
"""Agent with browser automation capabilities"""
def __init__(self, headless: bool = True):
self.headless = headless
self.playwright = None
self.browser = None
self.page = None
def start(self):
"""Start browser"""
self.playwright = sync_playwright().start()
self.browser = self.playwright.chromium.launch(headless=self.headless)
self.page = self.browser.new_page()
def stop(self):
"""Stop browser"""
if self.browser:
self.browser.close()
if self.playwright:
self.playwright.stop()
def navigate(self, url: str) -> dict:
"""Navigate to URL"""
try:
self.page.goto(url, wait_until="networkidle")
return {
"success": True,
"url": self.page.url,
"title": self.page.title()
}
except Exception as e:
return {
"success": False,
"error": str(e)
}
def click(self, selector: str) -> dict:
"""Click element"""
try:
self.page.click(selector)
return {"success": True}
except Exception as e:
return {"success": False, "error": str(e)}
def type_text(self, selector: str, text: str) -> dict:
"""Type text into element"""
try:
self.page.fill(selector, text)
return {"success": True}
except Exception as e:
return {"success": False, "error": str(e)}
def get_text(self, selector: str) -> Optional[str]:
"""Get text from element"""
try:
return self.page.text_content(selector)
except:
return None
def screenshot(self, path: str = "screenshot.png") -> dict:
"""Take screenshot"""
try:
self.page.screenshot(path=path)
return {"success": True, "path": path}
except Exception as e:
return {"success": False, "error": str(e)}
def get_page_content(self) -> str:
"""Get full page HTML"""
return self.page.content()
# Usage
agent = BrowserAgent()
agent.start()
# Navigate
agent.navigate("https://example.com")
# Interact
agent.type_text("#search", "AI agents")
agent.click("button[type='submit']")
# Extract
results = agent.get_text(".results")
agent.stop()
Selenium Alternative
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
class SeleniumAgent:
"""Browser automation with Selenium"""
def __init__(self):
options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
self.driver = webdriver.Chrome(options=options)
self.wait = WebDriverWait(self.driver, 10)
def navigate(self, url: str):
"""Navigate to URL"""
self.driver.get(url)
def click(self, selector: str, by: By = By.CSS_SELECTOR):
"""Click element"""
element = self.wait.until(
EC.element_to_be_clickable((by, selector))
)
element.click()
def type_text(self, selector: str, text: str, by: By = By.CSS_SELECTOR):
"""Type text"""
element = self.wait.until(
EC.presence_of_element_located((by, selector))
)
element.clear()
element.send_keys(text)
def get_text(self, selector: str, by: By = By.CSS_SELECTOR) -> str:
"""Get element text"""
element = self.wait.until(
EC.presence_of_element_located((by, selector))
)
return element.text
def close(self):
"""Close browser"""
self.driver.quit()
Web Scraping
Extract structured data from websites.
BeautifulSoup Scraping
import requests
from bs4 import BeautifulSoup
from typing import List, Dict
class WebScraper:
"""Web scraping agent"""
def __init__(self):
self.session = requests.Session()
self.session.headers.update({
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'
})
def fetch_page(self, url: str) -> Optional[BeautifulSoup]:
"""Fetch and parse page"""
try:
response = self.session.get(url, timeout=10)
response.raise_for_status()
return BeautifulSoup(response.content, 'html.parser')
except Exception as e:
print(f"Error fetching {url}: {e}")
return None
def extract_links(self, url: str) -> List[str]:
"""Extract all links from page"""
soup = self.fetch_page(url)
if not soup:
return []
links = []
for a in soup.find_all('a', href=True):
href = a['href']
# Convert relative to absolute
if href.startswith('/'):
from urllib.parse import urljoin
href = urljoin(url, href)
links.append(href)
return links
def extract_text(self, url: str, selector: Optional[str] = None) -> str:
"""Extract text from page"""
soup = self.fetch_page(url)
if not soup:
return ""
if selector:
element = soup.select_one(selector)
return element.get_text(strip=True) if element else ""
else:
return soup.get_text(separator='\n', strip=True)
def extract_structured_data(self, url: str, schema: dict) -> List[Dict]:
"""Extract structured data based on schema"""
soup = self.fetch_page(url)
if not soup:
return []
results = []
# Find all items matching container selector
items = soup.select(schema['container'])
for item in items:
data = {}
for field, selector in schema['fields'].items():
element = item.select_one(selector)
if element:
data[field] = element.get_text(strip=True)
if data:
results.append(data)
return results
# Usage
scraper = WebScraper()
# Extract structured data
schema = {
'container': '.product',
'fields': {
'name': '.product-name',
'price': '.product-price',
'rating': '.product-rating'
}
}
products = scraper.extract_structured_data('https://example.com/products', schema)
Handling Dynamic Content
class DynamicScraper:
"""Scrape JavaScript-rendered content"""
def __init__(self):
self.browser = BrowserAgent()
self.browser.start()
def scrape_dynamic(self, url: str, wait_selector: str = None) -> str:
"""Scrape page with JavaScript"""
self.browser.navigate(url)
# Wait for content to load
if wait_selector:
self.browser.page.wait_for_selector(wait_selector)
else:
self.browser.page.wait_for_load_state("networkidle")
# Get rendered HTML
return self.browser.get_page_content()
def scrape_infinite_scroll(self, url: str, max_scrolls: int = 10) -> str:
"""Scrape infinite scroll pages"""
self.browser.navigate(url)
for _ in range(max_scrolls):
# Scroll to bottom
self.browser.page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
# Wait for new content
self.browser.page.wait_for_timeout(1000)
return self.browser.get_page_content()
def close(self):
"""Close browser"""
self.browser.stop()
Form Filling and Navigation
Automated Form Submission
class FormAgent:
"""Agent that can fill and submit forms"""
def __init__(self):
self.browser = BrowserAgent()
self.browser.start()
def fill_form(self, url: str, form_data: dict) -> dict:
"""Fill and submit form"""
try:
# Navigate to page
self.browser.navigate(url)
# Fill fields
for selector, value in form_data.items():
if isinstance(value, str):
self.browser.type_text(selector, value)
elif value.get('type') == 'click':
self.browser.click(selector)
elif value.get('type') == 'select':
self.browser.page.select_option(selector, value['value'])
# Submit form
submit_button = form_data.get('submit_button', 'button[type="submit"]')
self.browser.click(submit_button)
# Wait for response
self.browser.page.wait_for_load_state("networkidle")
return {
"success": True,
"url": self.browser.page.url,
"title": self.browser.page.title()
}
except Exception as e:
return {
"success": False,
"error": str(e)
}
def close(self):
"""Close browser"""
self.browser.stop()
# Usage
agent = FormAgent()
form_data = {
'#name': 'John Doe',
'#email': 'john@example.com',
'#message': 'Hello from agent!',
'submit_button': '#submit-btn'
}
result = agent.fill_form('https://example.com/contact', form_data)
agent.close()
Multi-Step Navigation
class NavigationAgent:
"""Agent for multi-step web workflows"""
def __init__(self):
self.browser = BrowserAgent()
self.browser.start()
self.history = []
def execute_workflow(self, steps: List[dict]) -> dict:
"""Execute multi-step workflow"""
results = []
for i, step in enumerate(steps):
print(f"Step {i+1}: {step['action']}")
try:
if step['action'] == 'navigate':
result = self.browser.navigate(step['url'])
elif step['action'] == 'click':
result = self.browser.click(step['selector'])
elif step['action'] == 'type':
result = self.browser.type_text(step['selector'], step['text'])
elif step['action'] == 'wait':
self.browser.page.wait_for_timeout(step['duration'])
result = {"success": True}
elif step['action'] == 'extract':
text = self.browser.get_text(step['selector'])
result = {"success": True, "data": text}
elif step['action'] == 'screenshot':
result = self.browser.screenshot(step.get('path', f'step_{i}.png'))
else:
result = {"success": False, "error": "Unknown action"}
results.append({
"step": i + 1,
"action": step['action'],
"result": result
})
self.history.append({
"url": self.browser.page.url,
"title": self.browser.page.title()
})
if not result.get('success', False):
break
except Exception as e:
results.append({
"step": i + 1,
"action": step['action'],
"result": {"success": False, "error": str(e)}
})
break
return {
"completed": len(results),
"total": len(steps),
"results": results,
"history": self.history
}
def close(self):
"""Close browser"""
self.browser.stop()
# Usage
agent = NavigationAgent()
workflow = [
{"action": "navigate", "url": "https://example.com"},
{"action": "click", "selector": "#login-btn"},
{"action": "type", "selector": "#username", "text": "user@example.com"},
{"action": "type", "selector": "#password", "text": "password123"},
{"action": "click", "selector": "#submit"},
{"action": "wait", "duration": 2000},
{"action": "extract", "selector": ".welcome-message"},
{"action": "screenshot", "path": "logged-in.png"}
]
result = agent.execute_workflow(workflow)
agent.close()
Screenshot and Visual Understanding
Taking Screenshots
class ScreenshotAgent:
"""Agent for visual capture and analysis"""
def __init__(self):
self.browser = BrowserAgent()
self.browser.start()
self.client = openai.OpenAI()
def capture_and_analyze(self, url: str, question: str) -> dict:
"""Capture screenshot and analyze with vision model"""
# Navigate and capture
self.browser.navigate(url)
screenshot_path = "temp_screenshot.png"
self.browser.screenshot(screenshot_path)
# Analyze with vision model
import base64
with open(screenshot_path, "rb") as f:
image_data = base64.b64encode(f.read()).decode()
response = self.client.chat.completions.create(
model="gpt-4-vision-preview",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": question},
{
"type": "image_url",
"image_url": {
"url": f"data:image/png;base64,{image_data}"
}
}
]
}
],
max_tokens=500
)
return {
"url": url,
"question": question,
"analysis": response.choices[0].message.content,
"screenshot": screenshot_path
}
def compare_pages(self, url1: str, url2: str) -> dict:
"""Compare two pages visually"""
# Capture both
self.browser.navigate(url1)
self.browser.screenshot("page1.png")
self.browser.navigate(url2)
self.browser.screenshot("page2.png")
# Compare with vision model
question = "What are the main differences between these two pages?"
# Would need to send both images to vision model
# Implementation depends on specific vision API
return {
"url1": url1,
"url2": url2,
"screenshot1": "page1.png",
"screenshot2": "page2.png"
}
def close(self):
"""Close browser"""
self.browser.stop()
Element Detection
class ElementDetector:
"""Detect and locate elements on page"""
def __init__(self):
self.browser = BrowserAgent()
self.browser.start()
def find_element_by_description(self, url: str, description: str) -> Optional[str]:
"""Find element selector by natural language description"""
self.browser.navigate(url)
# Get page structure
elements = self.browser.page.evaluate("""
() => {
const elements = [];
document.querySelectorAll('button, a, input, select, textarea').forEach(el => {
elements.push({
tag: el.tagName,
text: el.textContent.trim(),
id: el.id,
class: el.className,
type: el.type
});
});
return elements;
}
""")
# Use LLM to match description to element
prompt = f"""Find the element matching this description: {description}
Available elements:
{json.dumps(elements, indent=2)}
Return the best CSS selector to target this element:"""
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}]
)
return response.choices[0].message.content.strip()
def close(self):
"""Close browser"""
self.browser.stop()
Complete Web Interaction Agent
class WebAgent:
"""Complete web interaction agent"""
def __init__(self):
self.browser = BrowserAgent()
self.browser.start()
self.scraper = WebScraper()
self.client = openai.OpenAI()
def execute_task(self, task: str, url: str) -> str:
"""Execute web task from natural language"""
# Generate action plan
plan = self.generate_plan(task, url)
# Execute plan
results = []
for step in plan:
result = self.execute_step(step)
results.append(result)
# Summarize results
return self.summarize_results(task, results)
def generate_plan(self, task: str, url: str) -> List[dict]:
"""Generate action plan for task"""
response = self.client.chat.completions.create(
model="gpt-4",
messages=[
{
"role": "system",
"content": """Generate a step-by-step plan for web automation.
Available actions:
- navigate: Go to URL
- click: Click element (provide selector)
- type: Type text (provide selector and text)
- extract: Extract text (provide selector)
- wait: Wait for duration (milliseconds)
- screenshot: Take screenshot
Return JSON array of steps."""
},
{
"role": "user",
"content": f"Task: {task}\nStarting URL: {url}"
}
],
temperature=0.2
)
import json
return json.loads(response.choices[0].message.content)
def execute_step(self, step: dict) -> dict:
"""Execute single step"""
action = step['action']
try:
if action == 'navigate':
return self.browser.navigate(step['url'])
elif action == 'click':
return self.browser.click(step['selector'])
elif action == 'type':
return self.browser.type_text(step['selector'], step['text'])
elif action == 'extract':
text = self.browser.get_text(step['selector'])
return {"success": True, "data": text}
elif action == 'wait':
self.browser.page.wait_for_timeout(step['duration'])
return {"success": True}
elif action == 'screenshot':
return self.browser.screenshot(step.get('path', 'screenshot.png'))
else:
return {"success": False, "error": f"Unknown action: {action}"}
except Exception as e:
return {"success": False, "error": str(e)}
def summarize_results(self, task: str, results: List[dict]) -> str:
"""Summarize execution results"""
response = self.client.chat.completions.create(
model="gpt-4",
messages=[
{
"role": "user",
"content": f"""Summarize the results of this web automation task:
Task: {task}
Results:
{json.dumps(results, indent=2)}
Provide a clear summary of what was accomplished:"""
}
]
)
return response.choices[0].message.content
def close(self):
"""Close browser"""
self.browser.stop()
# Usage
agent = WebAgent()
result = agent.execute_task(
"Search for 'AI agents' on the website and extract the top 3 results",
"https://example.com"
)
print(result)
agent.close()
Best Practices
- Respect robots.txt: Check if scraping is allowed
- Rate limiting: Don’t overwhelm servers
- Use headless mode: Faster and less resource-intensive
- Handle timeouts: Set reasonable wait times
- Error recovery: Retry failed operations
- Clean up resources: Close browsers properly
- User agent: Identify your bot appropriately
- Cache responses: Avoid redundant requests
- Validate selectors: Check elements exist before interacting
- Monitor performance: Track execution time
Common Pitfalls
Pitfall 1: Stale Selectors
Problem: Element selectors change Solution: Use more robust selectors (data attributes, ARIA labels)
Pitfall 2: Race Conditions
Problem: Clicking before element is ready Solution: Use explicit waits
Pitfall 3: Memory Leaks
Problem: Not closing browsers Solution: Always close in finally block or use context managers
Pitfall 4: Detection
Problem: Website blocks automated access Solution: Use stealth plugins, rotate user agents, add delays
Next Steps
Chapter 4 (Agent Tools & Capabilities) is complete! You now understand code execution, data access, and web interaction. In Chapter 5, we’ll explore production-ready agents including reliability, testing, and monitoring.
Reliability & Safety
Module 5: Learning Objectives
By the end of this module, you will:
- ✓ Implement input validation and guardrails
- ✓ Design comprehensive testing strategies
- ✓ Set up monitoring and observability systems
- ✓ Handle failures gracefully with retries and fallbacks
- ✓ Measure and improve agent reliability
Input Validation and Sanitization
Never trust user input. Always validate and sanitize.
Input Validation
from typing import Optional
import re
class InputValidator:
"""Validate user inputs"""
def __init__(self):
self.max_input_length = 10000
self.max_file_size = 10 * 1024 * 1024 # 10MB
def validate_text_input(self, text: str) -> dict:
"""Validate text input"""
errors = []
# Check type
if not isinstance(text, str):
return {"valid": False, "errors": ["Input must be string"]}
# Check length
if len(text) > self.max_input_length:
errors.append(f"Input too long (max {self.max_input_length} chars)")
# Check for null bytes
if '\x00' in text:
errors.append("Invalid characters detected")
# Check for control characters
if any(ord(c) < 32 and c not in '\n\r\t' for c in text):
errors.append("Control characters not allowed")
return {
"valid": len(errors) == 0,
"errors": errors
}
def validate_url(self, url: str) -> dict:
"""Validate URL"""
if not isinstance(url, str):
return {"valid": False, "errors": ["URL must be string"]}
# Basic URL pattern
url_pattern = re.compile(
r'^https?://' # http:// or https://
r'(?:(?:[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?\.)+[A-Z]{2,6}\.?|' # domain
r'localhost|' # localhost
r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})' # IP
r'(?::\d+)?' # optional port
r'(?:/?|[/?]\S+)$', re.IGNORECASE)
if not url_pattern.match(url):
return {"valid": False, "errors": ["Invalid URL format"]}
# Check for dangerous protocols
if url.startswith(('file://', 'javascript:', 'data:')):
return {"valid": False, "errors": ["Unsafe URL protocol"]}
return {"valid": True, "errors": []}
def validate_file_path(self, path: str, allowed_extensions: list = None) -> dict:
"""Validate file path"""
errors = []
# Check for path traversal
if '..' in path or path.startswith('/'):
errors.append("Path traversal detected")
# Check extension
if allowed_extensions:
ext = path.split('.')[-1].lower()
if ext not in allowed_extensions:
errors.append(f"File type not allowed. Allowed: {allowed_extensions}")
return {
"valid": len(errors) == 0,
"errors": errors
}
def sanitize_text(self, text: str) -> str:
"""Sanitize text input"""
# Remove null bytes
text = text.replace('\x00', '')
# Remove control characters except newlines and tabs
text = ''.join(c for c in text if ord(c) >= 32 or c in '\n\r\t')
# Trim whitespace
text = text.strip()
# Limit length
if len(text) > self.max_input_length:
text = text[:self.max_input_length]
return text
SQL Injection Prevention
import sqlite3
class SafeDatabase:
"""Database access with SQL injection prevention"""
def __init__(self, db_path: str):
self.db_path = db_path
def query(self, sql: str, params: tuple = ()) -> list:
"""Execute query with parameterized statements"""
conn = sqlite3.connect(self.db_path)
cursor = conn.cursor()
try:
# Always use parameterized queries
cursor.execute(sql, params)
results = cursor.fetchall()
conn.close()
return results
except Exception as e:
conn.close()
raise Exception(f"Query error: {str(e)}")
def safe_search(self, table: str, column: str, value: str) -> list:
"""Safe search with validation"""
# Validate table and column names (whitelist)
allowed_tables = ['users', 'products', 'orders']
allowed_columns = ['name', 'email', 'description', 'title']
if table not in allowed_tables:
raise ValueError(f"Invalid table: {table}")
if column not in allowed_columns:
raise ValueError(f"Invalid column: {column}")
# Use parameterized query
sql = f"SELECT * FROM {table} WHERE {column} LIKE ?"
return self.query(sql, (f"%{value}%",))
Output Guardrails
Ensure agent outputs are safe and appropriate.
Content Filtering
class OutputGuardrails:
"""Filter and validate agent outputs"""
def __init__(self):
self.client = openai.OpenAI()
self.blocked_patterns = [
r'\b\d{3}-\d{2}-\d{4}\b', # SSN
r'\b\d{16}\b', # Credit card
r'[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}', # Email (if needed)
]
def check_output(self, text: str) -> dict:
"""Check if output is safe"""
issues = []
# Check for PII
for pattern in self.blocked_patterns:
if re.search(pattern, text):
issues.append(f"Potential PII detected: {pattern}")
# Check for harmful content
if self.contains_harmful_content(text):
issues.append("Potentially harmful content detected")
# Check length
if len(text) > 50000:
issues.append("Output too long")
return {
"safe": len(issues) == 0,
"issues": issues
}
def contains_harmful_content(self, text: str) -> bool:
"""Check for harmful content using moderation API"""
try:
response = self.client.moderations.create(input=text)
result = response.results[0]
# Check if any category is flagged
return any([
result.categories.hate,
result.categories.violence,
result.categories.self_harm,
result.categories.sexual,
])
except:
return False
def redact_pii(self, text: str) -> str:
"""Redact PII from text"""
# Redact SSN
text = re.sub(r'\b\d{3}-\d{2}-\d{4}\b', '[REDACTED-SSN]', text)
# Redact credit cards
text = re.sub(r'\b\d{16}\b', '[REDACTED-CC]', text)
# Redact emails (if needed)
text = re.sub(
r'[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}',
'[REDACTED-EMAIL]',
text
)
return text
def filter_output(self, text: str) -> dict:
"""Filter and clean output"""
check = self.check_output(text)
if not check['safe']:
# Redact PII
text = self.redact_pii(text)
# Re-check
check = self.check_output(text)
return {
"text": text,
"safe": check['safe'],
"issues": check['issues']
}
Response Validation
class ResponseValidator:
"""Validate agent responses"""
def validate_response(self, response: str, expected_format: str = None) -> dict:
"""Validate response format and content"""
errors = []
# Check not empty
if not response or not response.strip():
errors.append("Empty response")
# Check format if specified
if expected_format == 'json':
try:
json.loads(response)
except json.JSONDecodeError:
errors.append("Invalid JSON format")
elif expected_format == 'markdown':
# Basic markdown validation
if not any(marker in response for marker in ['#', '*', '-', '`']):
errors.append("Not valid markdown")
# Check for refusal patterns
refusal_patterns = [
"I cannot", "I'm unable to", "I can't",
"I don't have access", "I'm not able to"
]
if any(pattern.lower() in response.lower() for pattern in refusal_patterns):
errors.append("Agent refused to complete task")
return {
"valid": len(errors) == 0,
"errors": errors
}
Rate Limiting and Cost Control
Prevent runaway costs and abuse.
Rate Limiter
import time
from collections import defaultdict
from threading import Lock
class RateLimiter:
"""Rate limit API calls"""
def __init__(self):
self.requests = defaultdict(list)
self.lock = Lock()
def check_rate_limit(self,
user_id: str,
max_requests: int = 100,
window_seconds: int = 3600) -> dict:
"""Check if user is within rate limit"""
with self.lock:
current_time = time.time()
# Remove old requests outside window
self.requests[user_id] = [
req_time for req_time in self.requests[user_id]
if current_time - req_time < window_seconds
]
# Check limit
if len(self.requests[user_id]) >= max_requests:
return {
"allowed": False,
"remaining": 0,
"reset_in": window_seconds - (current_time - self.requests[user_id][0])
}
# Add current request
self.requests[user_id].append(current_time)
return {
"allowed": True,
"remaining": max_requests - len(self.requests[user_id]),
"reset_in": window_seconds
}
Cost Tracker
class CostTracker:
"""Track and limit API costs"""
def __init__(self, max_cost_per_user: float = 10.0):
self.costs = defaultdict(float)
self.max_cost_per_user = max_cost_per_user
self.lock = Lock()
def estimate_cost(self, model: str, input_tokens: int, output_tokens: int) -> float:
"""Estimate cost for API call"""
# Pricing per 1K tokens (example rates)
pricing = {
'gpt-4': {'input': 0.03, 'output': 0.06},
'gpt-4-turbo': {'input': 0.01, 'output': 0.03},
'gpt-3.5-turbo': {'input': 0.0005, 'output': 0.0015},
}
if model not in pricing:
model = 'gpt-4' # Default to most expensive
cost = (
(input_tokens / 1000) * pricing[model]['input'] +
(output_tokens / 1000) * pricing[model]['output']
)
return cost
def check_budget(self, user_id: str, estimated_cost: float) -> dict:
"""Check if user has budget for request"""
with self.lock:
current_cost = self.costs[user_id]
if current_cost + estimated_cost > self.max_cost_per_user:
return {
"allowed": False,
"current_cost": current_cost,
"max_cost": self.max_cost_per_user,
"remaining": self.max_cost_per_user - current_cost
}
return {
"allowed": True,
"current_cost": current_cost,
"remaining": self.max_cost_per_user - current_cost - estimated_cost
}
def record_cost(self, user_id: str, cost: float):
"""Record actual cost"""
with self.lock:
self.costs[user_id] += cost
def reset_user_cost(self, user_id: str):
"""Reset user's cost (e.g., monthly)"""
with self.lock:
self.costs[user_id] = 0.0
Failure Modes and Fallbacks
Handle failures gracefully.
Retry Logic
import time
from functools import wraps
def retry_with_backoff(max_retries: int = 3, base_delay: float = 1.0):
"""Decorator for retry with exponential backoff"""
def decorator(func):
@wraps(func)
def wrapper(*args, **kwargs):
for attempt in range(max_retries):
try:
return func(*args, **kwargs)
except Exception as e:
if attempt == max_retries - 1:
raise
delay = base_delay * (2 ** attempt)
print(f"Attempt {attempt + 1} failed: {e}. Retrying in {delay}s...")
time.sleep(delay)
return wrapper
return decorator
# Usage
@retry_with_backoff(max_retries=3, base_delay=1.0)
def call_api(prompt: str) -> str:
"""API call with retry"""
response = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}]
)
return response.choices[0].message.content
Circuit Breaker
class CircuitBreaker:
"""Circuit breaker pattern for API calls"""
def __init__(self, failure_threshold: int = 5, timeout: int = 60):
self.failure_threshold = failure_threshold
self.timeout = timeout
self.failures = 0
self.last_failure_time = None
self.state = 'closed' # closed, open, half-open
def call(self, func, *args, **kwargs):
"""Execute function with circuit breaker"""
if self.state == 'open':
# Check if timeout has passed
if time.time() - self.last_failure_time > self.timeout:
self.state = 'half-open'
else:
raise Exception("Circuit breaker is OPEN")
try:
result = func(*args, **kwargs)
# Success - reset if in half-open
if self.state == 'half-open':
self.state = 'closed'
self.failures = 0
return result
except Exception as e:
self.failures += 1
self.last_failure_time = time.time()
if self.failures >= self.failure_threshold:
self.state = 'open'
raise e
Fallback Strategies
class FallbackAgent:
"""Agent with fallback strategies"""
def __init__(self):
self.primary_model = "gpt-4"
self.fallback_model = "gpt-3.5-turbo"
self.client = openai.OpenAI()
def generate_with_fallback(self, prompt: str) -> dict:
"""Try primary model, fallback to cheaper model if fails"""
try:
response = self.client.chat.completions.create(
model=self.primary_model,
messages=[{"role": "user", "content": prompt}],
timeout=30
)
return {
"success": True,
"response": response.choices[0].message.content,
"model": self.primary_model
}
except Exception as e:
print(f"Primary model failed: {e}. Trying fallback...")
try:
response = self.client.chat.completions.create(
model=self.fallback_model,
messages=[{"role": "user", "content": prompt}],
timeout=30
)
return {
"success": True,
"response": response.choices[0].message.content,
"model": self.fallback_model,
"fallback": True
}
except Exception as e2:
return {
"success": False,
"error": str(e2)
}
def execute_with_fallback(self, task: str, strategies: list) -> dict:
"""Try multiple strategies in order"""
for i, strategy in enumerate(strategies):
try:
result = strategy(task)
return {
"success": True,
"result": result,
"strategy": i
}
except Exception as e:
if i == len(strategies) - 1:
return {
"success": False,
"error": f"All strategies failed. Last error: {e}"
}
continue
Complete Safe Agent
class SafeAgent:
"""Production-ready agent with safety features"""
def __init__(self, user_id: str):
self.user_id = user_id
self.validator = InputValidator()
self.guardrails = OutputGuardrails()
self.rate_limiter = RateLimiter()
self.cost_tracker = CostTracker()
self.circuit_breaker = CircuitBreaker()
self.client = openai.OpenAI()
def process(self, user_input: str) -> dict:
"""Process user input safely"""
# 1. Validate input
validation = self.validator.validate_text_input(user_input)
if not validation['valid']:
return {
"success": False,
"error": "Invalid input",
"details": validation['errors']
}
# 2. Check rate limit
rate_check = self.rate_limiter.check_rate_limit(self.user_id)
if not rate_check['allowed']:
return {
"success": False,
"error": "Rate limit exceeded",
"reset_in": rate_check['reset_in']
}
# 3. Sanitize input
clean_input = self.validator.sanitize_text(user_input)
# 4. Estimate cost
estimated_tokens = len(clean_input.split()) * 1.3 # Rough estimate
estimated_cost = self.cost_tracker.estimate_cost(
'gpt-4',
int(estimated_tokens),
500 # Estimated output
)
# 5. Check budget
budget_check = self.cost_tracker.check_budget(self.user_id, estimated_cost)
if not budget_check['allowed']:
return {
"success": False,
"error": "Budget exceeded",
"remaining": budget_check['remaining']
}
# 6. Generate response with circuit breaker
try:
response = self.circuit_breaker.call(
self._generate_response,
clean_input
)
except Exception as e:
return {
"success": False,
"error": f"Generation failed: {str(e)}"
}
# 7. Validate output
filtered = self.guardrails.filter_output(response)
if not filtered['safe']:
return {
"success": False,
"error": "Output failed safety check",
"issues": filtered['issues']
}
# 8. Record actual cost
self.cost_tracker.record_cost(self.user_id, estimated_cost)
return {
"success": True,
"response": filtered['text'],
"cost": estimated_cost,
"remaining_budget": budget_check['remaining'] - estimated_cost
}
@retry_with_backoff(max_retries=3)
def _generate_response(self, prompt: str) -> str:
"""Generate response with retry"""
response = self.client.chat.completions.create(
model="gpt-4",
messages=[
{
"role": "system",
"content": "You are a helpful assistant. Never share personal information or harmful content."
},
{"role": "user", "content": prompt}
],
timeout=30
)
return response.choices[0].message.content
# Usage
agent = SafeAgent(user_id="user123")
result = agent.process("What is the capital of France?")
if result['success']:
print(result['response'])
else:
print(f"Error: {result['error']}")
Best Practices
- Validate everything: Never trust input
- Sanitize data: Clean before processing
- Rate limit: Prevent abuse
- Track costs: Monitor spending
- Filter outputs: Check for harmful content
- Implement retries: Handle transient failures
- Use circuit breakers: Prevent cascading failures
- Have fallbacks: Multiple strategies
- Log everything: Track for debugging
- Test failure modes: Ensure graceful degradation
Next Steps
You now understand reliability and safety! Next, we’ll explore evaluation and testing to ensure your agents work correctly.
Evaluation & Testing
Agent Benchmarks
Measure agent performance systematically.
Creating Test Suites
from dataclasses import dataclass
from typing import List, Callable
import time
@dataclass
class TestCase:
"""Single test case"""
name: str
input: str
expected_output: str = None
expected_behavior: str = None
timeout: int = 30
@dataclass
class TestResult:
"""Test result"""
test_name: str
passed: bool
actual_output: str
expected_output: str
execution_time: float
error: str = None
class AgentTestSuite:
"""Test suite for agents"""
def __init__(self, agent):
self.agent = agent
self.test_cases = []
self.results = []
def add_test(self, test_case: TestCase):
"""Add test case"""
self.test_cases.append(test_case)
def run_tests(self) -> dict:
"""Run all tests"""
self.results = []
for test in self.test_cases:
print(f"Running: {test.name}...")
result = self.run_single_test(test)
self.results.append(result)
return self.generate_report()
def run_single_test(self, test: TestCase) -> TestResult:
"""Run single test"""
start_time = time.time()
try:
# Execute agent
actual_output = self.agent.process(test.input)
execution_time = time.time() - start_time
# Check result
if test.expected_output:
passed = self.check_output_match(actual_output, test.expected_output)
elif test.expected_behavior:
passed = self.check_behavior(actual_output, test.expected_behavior)
else:
passed = True # Just check it doesn't crash
return TestResult(
test_name=test.name,
passed=passed,
actual_output=actual_output,
expected_output=test.expected_output or test.expected_behavior,
execution_time=execution_time
)
except Exception as e:
execution_time = time.time() - start_time
return TestResult(
test_name=test.name,
passed=False,
actual_output="",
expected_output=test.expected_output or test.expected_behavior,
execution_time=execution_time,
error=str(e)
)
def check_output_match(self, actual: str, expected: str) -> bool:
"""Check if output matches expected"""
# Exact match
if actual.strip() == expected.strip():
return True
# Contains expected
if expected.lower() in actual.lower():
return True
return False
def check_behavior(self, output: str, behavior: str) -> bool:
"""Check if output exhibits expected behavior"""
# Use LLM to judge
prompt = f"""Does this output exhibit the expected behavior?
Output: {output}
Expected behavior: {behavior}
Answer with just 'yes' or 'no':"""
response = llm.generate(prompt).strip().lower()
return response == 'yes'
def generate_report(self) -> dict:
"""Generate test report"""
total = len(self.results)
passed = sum(1 for r in self.results if r.passed)
failed = total - passed
avg_time = sum(r.execution_time for r in self.results) / total if total > 0 else 0
return {
"total": total,
"passed": passed,
"failed": failed,
"pass_rate": passed / total if total > 0 else 0,
"avg_execution_time": avg_time,
"results": self.results
}
# Usage
suite = AgentTestSuite(agent)
suite.add_test(TestCase(
name="Basic math",
input="What is 2 + 2?",
expected_output="4"
))
suite.add_test(TestCase(
name="Tool usage",
input="Search for information about Python",
expected_behavior="Uses search tool and provides relevant information"
))
report = suite.run_tests()
print(f"Pass rate: {report['pass_rate']:.1%}")
Standard Benchmarks
class StandardBenchmarks:
"""Common agent benchmarks"""
@staticmethod
def get_math_benchmark() -> List[TestCase]:
"""Math reasoning tests"""
return [
TestCase("Addition", "What is 123 + 456?", "579"),
TestCase("Multiplication", "What is 25 * 17?", "425"),
TestCase("Word problem", "If I have 3 apples and buy 2 more, how many do I have?", "5"),
TestCase("Percentage", "What is 15% of 200?", "30"),
]
@staticmethod
def get_reasoning_benchmark() -> List[TestCase]:
"""Logical reasoning tests"""
return [
TestCase(
"Deduction",
"All cats are animals. Fluffy is a cat. Is Fluffy an animal?",
expected_behavior="Correctly deduces that Fluffy is an animal"
),
TestCase(
"Planning",
"I need to make dinner. What steps should I take?",
expected_behavior="Provides logical sequence of steps"
),
]
@staticmethod
def get_tool_usage_benchmark() -> List[TestCase]:
"""Tool usage tests"""
return [
TestCase(
"Search",
"Find information about the Eiffel Tower",
expected_behavior="Uses search tool and provides facts"
),
TestCase(
"Calculation",
"Calculate the compound interest on $1000 at 5% for 3 years",
expected_behavior="Uses calculator tool"
),
]
Success Metrics
Define what success means for your agent.
Quantitative Metrics
class AgentMetrics:
"""Track agent performance metrics"""
def __init__(self):
self.metrics = {
"total_requests": 0,
"successful_requests": 0,
"failed_requests": 0,
"total_execution_time": 0,
"tool_calls": 0,
"tokens_used": 0,
"cost": 0.0
}
def record_request(self,
success: bool,
execution_time: float,
tool_calls: int = 0,
tokens: int = 0,
cost: float = 0.0):
"""Record request metrics"""
self.metrics["total_requests"] += 1
if success:
self.metrics["successful_requests"] += 1
else:
self.metrics["failed_requests"] += 1
self.metrics["total_execution_time"] += execution_time
self.metrics["tool_calls"] += tool_calls
self.metrics["tokens_used"] += tokens
self.metrics["cost"] += cost
def get_summary(self) -> dict:
"""Get metrics summary"""
total = self.metrics["total_requests"]
if total == 0:
return self.metrics
return {
**self.metrics,
"success_rate": self.metrics["successful_requests"] / total,
"avg_execution_time": self.metrics["total_execution_time"] / total,
"avg_tool_calls": self.metrics["tool_calls"] / total,
"avg_tokens": self.metrics["tokens_used"] / total,
"avg_cost": self.metrics["cost"] / total
}
def print_summary(self):
"""Print formatted summary"""
summary = self.get_summary()
print("Agent Performance Metrics")
print("=" * 40)
print(f"Total Requests: {summary['total_requests']}")
print(f"Success Rate: {summary['success_rate']:.1%}")
print(f"Avg Execution Time: {summary['avg_execution_time']:.2f}s")
print(f"Avg Tool Calls: {summary['avg_tool_calls']:.1f}")
print(f"Avg Tokens: {summary['avg_tokens']:.0f}")
print(f"Avg Cost: ${summary['avg_cost']:.4f}")
print(f"Total Cost: ${summary['cost']:.2f}")
Qualitative Metrics
class QualityEvaluator:
"""Evaluate response quality"""
def __init__(self):
self.client = openai.OpenAI()
def evaluate_response(self,
question: str,
response: str,
criteria: List[str] = None) -> dict:
"""Evaluate response quality"""
if criteria is None:
criteria = [
"Accuracy: Is the information correct?",
"Completeness: Does it fully answer the question?",
"Clarity: Is it easy to understand?",
"Relevance: Does it stay on topic?"
]
scores = {}
for criterion in criteria:
score = self.score_criterion(question, response, criterion)
criterion_name = criterion.split(':')[0]
scores[criterion_name] = score
return {
"scores": scores,
"average": sum(scores.values()) / len(scores),
"passed": all(score >= 3 for score in scores.values())
}
def score_criterion(self, question: str, response: str, criterion: str) -> int:
"""Score response on single criterion (1-5)"""
prompt = f"""Rate this response on the following criterion (1-5):
Question: {question}
Response: {response}
Criterion: {criterion}
Provide only a number from 1 (poor) to 5 (excellent):"""
result = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.1
)
try:
score = int(result.choices[0].message.content.strip())
return max(1, min(5, score)) # Clamp to 1-5
except:
return 3 # Default to middle score
Unit and Integration Testing
Unit Tests for Components
import unittest
class TestAgentComponents(unittest.TestCase):
"""Unit tests for agent components"""
def setUp(self):
"""Set up test fixtures"""
self.agent = MyAgent()
def test_input_validation(self):
"""Test input validation"""
validator = InputValidator()
# Valid input
result = validator.validate_text_input("Hello world")
self.assertTrue(result['valid'])
# Invalid input (too long)
long_text = "x" * 20000
result = validator.validate_text_input(long_text)
self.assertFalse(result['valid'])
def test_tool_execution(self):
"""Test tool execution"""
result = self.agent.execute_tool("calculate", {"expression": "2 + 2"})
self.assertEqual(result, "4")
def test_memory_storage(self):
"""Test memory system"""
self.agent.memory.add("user_name", "Alice")
retrieved = self.agent.memory.get("user_name")
self.assertEqual(retrieved, "Alice")
def test_error_handling(self):
"""Test error handling"""
# Should not crash on invalid tool
result = self.agent.execute_tool("nonexistent_tool", {})
self.assertIn("error", result.lower())
def tearDown(self):
"""Clean up"""
pass
# Run tests
if __name__ == '__main__':
unittest.main()
Integration Tests
class TestAgentIntegration(unittest.TestCase):
"""Integration tests for full agent"""
def test_end_to_end_query(self):
"""Test complete query flow"""
agent = MyAgent()
response = agent.process("What is 2 + 2?")
self.assertIsNotNone(response)
self.assertIn("4", response)
def test_multi_step_task(self):
"""Test multi-step task execution"""
agent = MyAgent()
response = agent.process("Search for Python tutorials and summarize the top result")
# Should use search tool
self.assertTrue(agent.tool_used("search"))
# Should provide summary
self.assertGreater(len(response), 50)
def test_error_recovery(self):
"""Test error recovery"""
agent = MyAgent()
# Simulate tool failure
agent.tools["search"] = lambda x: raise_error()
response = agent.process("Search for something")
# Should handle gracefully
self.assertIsNotNone(response)
self.assertNotIn("Traceback", response)
def test_rate_limiting(self):
"""Test rate limiting"""
agent = MyAgent()
# Make many requests
for i in range(150):
response = agent.process(f"Request {i}")
# Should be rate limited
self.assertTrue(agent.was_rate_limited())
Property-Based Testing
from hypothesis import given, strategies as st
class TestAgentProperties(unittest.TestCase):
"""Property-based tests"""
@given(st.text(min_size=1, max_size=1000))
def test_agent_handles_any_text(self, text):
"""Agent should handle any text input without crashing"""
agent = MyAgent()
try:
response = agent.process(text)
# Should return something
self.assertIsNotNone(response)
except Exception as e:
# Should not crash
self.fail(f"Agent crashed on input: {text[:50]}... Error: {e}")
@given(st.integers(min_value=-1000, max_value=1000))
def test_calculator_tool(self, number):
"""Calculator should handle any integer"""
agent = MyAgent()
result = agent.execute_tool("calculate", {"expression": f"{number} + 1"})
expected = str(number + 1)
self.assertEqual(result, expected)
Human Evaluation Frameworks
Collecting Human Feedback
class HumanEvaluator:
"""Collect human evaluations"""
def __init__(self):
self.evaluations = []
def request_evaluation(self,
question: str,
response: str,
evaluator_id: str) -> dict:
"""Request human evaluation"""
print(f"\n{'='*60}")
print(f"Question: {question}")
print(f"\nResponse: {response}")
print(f"\n{'='*60}")
# Collect ratings
ratings = {}
criteria = [
("accuracy", "Is the response accurate? (1-5)"),
("helpfulness", "Is the response helpful? (1-5)"),
("clarity", "Is the response clear? (1-5)"),
]
for key, prompt in criteria:
while True:
try:
score = int(input(f"{prompt}: "))
if 1 <= score <= 5:
ratings[key] = score
break
except ValueError:
pass
# Collect feedback
feedback = input("\nAdditional feedback (optional): ")
evaluation = {
"question": question,
"response": response,
"evaluator_id": evaluator_id,
"ratings": ratings,
"feedback": feedback,
"timestamp": time.time()
}
self.evaluations.append(evaluation)
return evaluation
def get_summary(self) -> dict:
"""Get evaluation summary"""
if not self.evaluations:
return {}
# Average ratings
avg_ratings = {}
for criterion in ["accuracy", "helpfulness", "clarity"]:
scores = [e["ratings"][criterion] for e in self.evaluations]
avg_ratings[criterion] = sum(scores) / len(scores)
return {
"total_evaluations": len(self.evaluations),
"average_ratings": avg_ratings,
"overall_score": sum(avg_ratings.values()) / len(avg_ratings)
}
A/B Testing
class ABTest:
"""A/B test different agent versions"""
def __init__(self, agent_a, agent_b):
self.agent_a = agent_a
self.agent_b = agent_b
self.results = {"a": [], "b": []}
def run_test(self, test_cases: List[str], evaluator) -> dict:
"""Run A/B test"""
for i, test_case in enumerate(test_cases):
# Alternate between agents
if i % 2 == 0:
agent = self.agent_a
variant = "a"
else:
agent = self.agent_b
variant = "b"
# Get response
response = agent.process(test_case)
# Evaluate
evaluation = evaluator.evaluate_response(test_case, response)
self.results[variant].append(evaluation)
return self.compare_results()
def compare_results(self) -> dict:
"""Compare A vs B"""
avg_a = sum(r["average"] for r in self.results["a"]) / len(self.results["a"])
avg_b = sum(r["average"] for r in self.results["b"]) / len(self.results["b"])
return {
"agent_a_score": avg_a,
"agent_b_score": avg_b,
"winner": "a" if avg_a > avg_b else "b",
"difference": abs(avg_a - avg_b)
}
Automated Testing Pipeline
class TestPipeline:
"""Automated testing pipeline"""
def __init__(self, agent):
self.agent = agent
self.test_suite = AgentTestSuite(agent)
self.metrics = AgentMetrics()
self.evaluator = QualityEvaluator()
def run_full_pipeline(self) -> dict:
"""Run complete test pipeline"""
results = {}
# 1. Unit tests
print("Running unit tests...")
results["unit_tests"] = self.run_unit_tests()
# 2. Integration tests
print("Running integration tests...")
results["integration_tests"] = self.run_integration_tests()
# 3. Benchmark tests
print("Running benchmarks...")
results["benchmarks"] = self.run_benchmarks()
# 4. Quality evaluation
print("Running quality evaluation...")
results["quality"] = self.run_quality_evaluation()
# 5. Performance metrics
print("Collecting performance metrics...")
results["performance"] = self.metrics.get_summary()
# 6. Generate report
report = self.generate_report(results)
return report
def run_unit_tests(self) -> dict:
"""Run unit tests"""
loader = unittest.TestLoader()
suite = loader.loadTestsFromTestCase(TestAgentComponents)
runner = unittest.TextTestRunner(verbosity=0)
result = runner.run(suite)
return {
"total": result.testsRun,
"passed": result.testsRun - len(result.failures) - len(result.errors),
"failed": len(result.failures) + len(result.errors)
}
def run_integration_tests(self) -> dict:
"""Run integration tests"""
loader = unittest.TestLoader()
suite = loader.loadTestsFromTestCase(TestAgentIntegration)
runner = unittest.TextTestRunner(verbosity=0)
result = runner.run(suite)
return {
"total": result.testsRun,
"passed": result.testsRun - len(result.failures) - len(result.errors),
"failed": len(result.failures) + len(result.errors)
}
def run_benchmarks(self) -> dict:
"""Run benchmark tests"""
# Add standard benchmarks
for test in StandardBenchmarks.get_math_benchmark():
self.test_suite.add_test(test)
for test in StandardBenchmarks.get_reasoning_benchmark():
self.test_suite.add_test(test)
return self.test_suite.run_tests()
def run_quality_evaluation(self) -> dict:
"""Run quality evaluation"""
test_cases = [
("What is Python?", "Python is a high-level programming language..."),
("How do I sort a list?", "You can use the sorted() function..."),
]
evaluations = []
for question, response in test_cases:
eval_result = self.evaluator.evaluate_response(question, response)
evaluations.append(eval_result)
avg_score = sum(e["average"] for e in evaluations) / len(evaluations)
return {
"evaluations": evaluations,
"average_score": avg_score
}
def generate_report(self, results: dict) -> dict:
"""Generate comprehensive report"""
return {
"timestamp": time.time(),
"summary": {
"unit_tests_passed": results["unit_tests"]["passed"],
"integration_tests_passed": results["integration_tests"]["passed"],
"benchmark_pass_rate": results["benchmarks"]["pass_rate"],
"quality_score": results["quality"]["average_score"],
"success_rate": results["performance"]["success_rate"]
},
"details": results
}
# Usage
pipeline = TestPipeline(agent)
report = pipeline.run_full_pipeline()
print("\nTest Report Summary")
print("=" * 40)
for key, value in report["summary"].items():
print(f"{key}: {value}")
Best Practices
- Test early and often: Continuous testing during development
- Automate testing: Run tests automatically on changes
- Use multiple metrics: Quantitative and qualitative
- Test edge cases: Unusual inputs, errors, limits
- Benchmark regularly: Track performance over time
- Get human feedback: Automated tests aren’t enough
- Test in production: Monitor real usage
- Version your tests: Track test changes
- Document failures: Learn from what breaks
- Iterate based on results: Use tests to improve
Next Steps
You now understand evaluation and testing! Next, we’ll explore monitoring and observability for production agents.
Monitoring & Observability
Logging and Tracing
Track what your agent is doing at every step.
Structured Logging
import logging
import json
from datetime import datetime
from typing import Any, Dict
class AgentLogger:
"""Structured logging for agents"""
def __init__(self, agent_id: str, log_file: str = "agent.log"):
self.agent_id = agent_id
self.logger = logging.getLogger(agent_id)
self.logger.setLevel(logging.INFO)
# File handler
handler = logging.FileHandler(log_file)
handler.setFormatter(logging.Formatter('%(message)s'))
self.logger.addHandler(handler)
# Console handler
console = logging.StreamHandler()
console.setFormatter(logging.Formatter('%(levelname)s: %(message)s'))
self.logger.addHandler(console)
def log_event(self,
event_type: str,
data: Dict[str, Any],
level: str = "info"):
"""Log structured event"""
log_entry = {
"timestamp": datetime.utcnow().isoformat(),
"agent_id": self.agent_id,
"event_type": event_type,
"data": data
}
log_message = json.dumps(log_entry)
if level == "info":
self.logger.info(log_message)
elif level == "warning":
self.logger.warning(log_message)
elif level == "error":
self.logger.error(log_message)
elif level == "debug":
self.logger.debug(log_message)
def log_request(self, user_id: str, input_text: str):
"""Log incoming request"""
self.log_event("request", {
"user_id": user_id,
"input": input_text[:200], # Truncate long inputs
"input_length": len(input_text)
})
def log_response(self, user_id: str, output_text: str, execution_time: float):
"""Log response"""
self.log_event("response", {
"user_id": user_id,
"output": output_text[:200],
"output_length": len(output_text),
"execution_time": execution_time
})
def log_tool_call(self, tool_name: str, parameters: dict, result: Any):
"""Log tool execution"""
self.log_event("tool_call", {
"tool": tool_name,
"parameters": parameters,
"result": str(result)[:200],
"success": result is not None
})
def log_error(self, error_type: str, error_message: str, context: dict = None):
"""Log error"""
self.log_event("error", {
"error_type": error_type,
"message": error_message,
"context": context or {}
}, level="error")
# Usage
logger = AgentLogger("agent-001")
logger.log_request("user123", "What is the weather?")
logger.log_tool_call("weather_api", {"location": "NYC"}, {"temp": 72})
logger.log_response("user123", "It's 72°F in NYC", 1.5)
Distributed Tracing
import uuid
from contextlib import contextmanager
from typing import Optional
class Tracer:
"""Distributed tracing for agent operations"""
def __init__(self):
self.traces = {}
self.current_trace = None
@contextmanager
def trace(self, operation_name: str, parent_id: Optional[str] = None):
"""Create trace span"""
span_id = str(uuid.uuid4())
trace_id = parent_id or str(uuid.uuid4())
span = {
"span_id": span_id,
"trace_id": trace_id,
"operation": operation_name,
"start_time": time.time(),
"parent_id": parent_id,
"children": [],
"metadata": {}
}
# Store current trace
previous_trace = self.current_trace
self.current_trace = span_id
self.traces[span_id] = span
try:
yield span
finally:
# End span
span["end_time"] = time.time()
span["duration"] = span["end_time"] - span["start_time"]
# Restore previous trace
self.current_trace = previous_trace
def add_metadata(self, key: str, value: Any):
"""Add metadata to current span"""
if self.current_trace:
self.traces[self.current_trace]["metadata"][key] = value
def get_trace(self, trace_id: str) -> dict:
"""Get full trace"""
spans = [s for s in self.traces.values() if s["trace_id"] == trace_id]
# Build tree
root = [s for s in spans if s["parent_id"] is None][0]
self._build_tree(root, spans)
return root
def _build_tree(self, node: dict, all_spans: list):
"""Build trace tree"""
children = [s for s in all_spans if s["parent_id"] == node["span_id"]]
node["children"] = children
for child in children:
self._build_tree(child, all_spans)
# Usage
tracer = Tracer()
with tracer.trace("agent_request") as trace:
tracer.add_metadata("user_id", "user123")
with tracer.trace("tool_call", parent_id=trace["span_id"]):
tracer.add_metadata("tool", "search")
# Execute tool
pass
with tracer.trace("generate_response", parent_id=trace["span_id"]):
# Generate response
pass
# View trace
full_trace = tracer.get_trace(trace["trace_id"])
Performance Metrics
Track agent performance in real-time.
Metrics Collector
from collections import defaultdict
from threading import Lock
import time
class MetricsCollector:
"""Collect and aggregate metrics"""
def __init__(self):
self.metrics = defaultdict(list)
self.counters = defaultdict(int)
self.lock = Lock()
def record_metric(self, name: str, value: float, tags: dict = None):
"""Record a metric value"""
with self.lock:
self.metrics[name].append({
"value": value,
"timestamp": time.time(),
"tags": tags or {}
})
def increment_counter(self, name: str, amount: int = 1):
"""Increment counter"""
with self.lock:
self.counters[name] += amount
def get_stats(self, name: str, window_seconds: int = 3600) -> dict:
"""Get statistics for metric"""
with self.lock:
current_time = time.time()
# Filter to time window
values = [
m["value"] for m in self.metrics[name]
if current_time - m["timestamp"] < window_seconds
]
if not values:
return {}
return {
"count": len(values),
"min": min(values),
"max": max(values),
"avg": sum(values) / len(values),
"p50": self._percentile(values, 50),
"p95": self._percentile(values, 95),
"p99": self._percentile(values, 99)
}
def _percentile(self, values: list, percentile: int) -> float:
"""Calculate percentile"""
sorted_values = sorted(values)
index = int(len(sorted_values) * percentile / 100)
return sorted_values[min(index, len(sorted_values) - 1)]
def get_counter(self, name: str) -> int:
"""Get counter value"""
with self.lock:
return self.counters[name]
def reset(self):
"""Reset all metrics"""
with self.lock:
self.metrics.clear()
self.counters.clear()
# Usage
metrics = MetricsCollector()
# Record metrics
metrics.record_metric("response_time", 1.5, {"user": "user123"})
metrics.record_metric("response_time", 2.1, {"user": "user456"})
metrics.increment_counter("total_requests")
metrics.increment_counter("successful_requests")
# Get stats
stats = metrics.get_stats("response_time")
print(f"Avg response time: {stats['avg']:.2f}s")
print(f"P95 response time: {stats['p95']:.2f}s")
Real-Time Dashboard
class MetricsDashboard:
"""Real-time metrics dashboard"""
def __init__(self, metrics_collector: MetricsCollector):
self.metrics = metrics_collector
def display(self):
"""Display current metrics"""
print("\n" + "="*60)
print("AGENT METRICS DASHBOARD")
print("="*60)
# Request metrics
total = self.metrics.get_counter("total_requests")
successful = self.metrics.get_counter("successful_requests")
failed = self.metrics.get_counter("failed_requests")
print(f"\nRequests:")
print(f" Total: {total}")
print(f" Successful: {successful}")
print(f" Failed: {failed}")
if total > 0:
print(f" Success Rate: {successful/total:.1%}")
# Response time
response_stats = self.metrics.get_stats("response_time")
if response_stats:
print(f"\nResponse Time:")
print(f" Average: {response_stats['avg']:.2f}s")
print(f" P50: {response_stats['p50']:.2f}s")
print(f" P95: {response_stats['p95']:.2f}s")
print(f" P99: {response_stats['p99']:.2f}s")
# Tool usage
tool_calls = self.metrics.get_counter("tool_calls")
print(f"\nTool Calls: {tool_calls}")
# Cost
total_cost = self.metrics.get_counter("total_cost_cents") / 100
print(f"\nTotal Cost: ${total_cost:.2f}")
print("="*60 + "\n")
Cost Tracking
Monitor spending in real-time.
Cost Monitor
class CostMonitor:
"""Monitor and alert on costs"""
def __init__(self, budget_limit: float = 100.0):
self.budget_limit = budget_limit
self.costs = defaultdict(float)
self.lock = Lock()
self.alerts = []
def record_cost(self,
user_id: str,
cost: float,
model: str,
tokens: int):
"""Record cost"""
with self.lock:
self.costs[user_id] += cost
# Check for alerts
if self.costs[user_id] > self.budget_limit * 0.8:
self.add_alert(
"warning",
f"User {user_id} at 80% of budget: ${self.costs[user_id]:.2f}"
)
if self.costs[user_id] > self.budget_limit:
self.add_alert(
"critical",
f"User {user_id} exceeded budget: ${self.costs[user_id]:.2f}"
)
def add_alert(self, level: str, message: str):
"""Add alert"""
alert = {
"level": level,
"message": message,
"timestamp": time.time()
}
self.alerts.append(alert)
# Log alert
if level == "critical":
logger.log_event("cost_alert", alert, level="error")
else:
logger.log_event("cost_alert", alert, level="warning")
def get_user_cost(self, user_id: str) -> dict:
"""Get user's cost"""
with self.lock:
cost = self.costs[user_id]
return {
"cost": cost,
"budget": self.budget_limit,
"remaining": self.budget_limit - cost,
"percentage": (cost / self.budget_limit) * 100
}
def get_total_cost(self) -> float:
"""Get total cost across all users"""
with self.lock:
return sum(self.costs.values())
def get_alerts(self, level: str = None) -> list:
"""Get alerts"""
if level:
return [a for a in self.alerts if a["level"] == level]
return self.alerts
User Feedback Loops
Collect and act on user feedback.
Feedback Collector
class FeedbackCollector:
"""Collect user feedback"""
def __init__(self):
self.feedback = []
self.ratings = defaultdict(list)
def collect_rating(self,
user_id: str,
interaction_id: str,
rating: int,
comment: str = ""):
"""Collect user rating (1-5)"""
feedback = {
"user_id": user_id,
"interaction_id": interaction_id,
"rating": rating,
"comment": comment,
"timestamp": time.time()
}
self.feedback.append(feedback)
self.ratings[user_id].append(rating)
# Log feedback
logger.log_event("user_feedback", feedback)
# Alert on low ratings
if rating <= 2:
logger.log_event("low_rating", feedback, level="warning")
def get_average_rating(self, user_id: str = None) -> float:
"""Get average rating"""
if user_id:
ratings = self.ratings[user_id]
else:
ratings = [f["rating"] for f in self.feedback]
if not ratings:
return 0.0
return sum(ratings) / len(ratings)
def get_recent_feedback(self, limit: int = 10) -> list:
"""Get recent feedback"""
return sorted(
self.feedback,
key=lambda x: x["timestamp"],
reverse=True
)[:limit]
def get_low_ratings(self, threshold: int = 2) -> list:
"""Get low-rated interactions"""
return [
f for f in self.feedback
if f["rating"] <= threshold
]
Feedback Analysis
class FeedbackAnalyzer:
"""Analyze feedback patterns"""
def __init__(self, feedback_collector: FeedbackCollector):
self.collector = feedback_collector
self.client = openai.OpenAI()
def analyze_trends(self) -> dict:
"""Analyze feedback trends"""
recent = self.collector.get_recent_feedback(limit=100)
if not recent:
return {}
# Calculate trends
ratings = [f["rating"] for f in recent]
return {
"average_rating": sum(ratings) / len(ratings),
"total_feedback": len(recent),
"rating_distribution": {
"5_star": sum(1 for r in ratings if r == 5),
"4_star": sum(1 for r in ratings if r == 4),
"3_star": sum(1 for r in ratings if r == 3),
"2_star": sum(1 for r in ratings if r == 2),
"1_star": sum(1 for r in ratings if r == 1),
}
}
def identify_issues(self) -> list:
"""Identify common issues from feedback"""
low_ratings = self.collector.get_low_ratings()
if not low_ratings:
return []
# Extract comments
comments = [f["comment"] for f in low_ratings if f["comment"]]
if not comments:
return []
# Use LLM to identify themes
prompt = f"""Analyze these negative feedback comments and identify common themes:
{chr(10).join(comments[:20])}
List the top 3 issues:"""
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}]
)
return response.choices[0].message.content.split('\n')
Complete Monitoring System
class AgentMonitor:
"""Complete monitoring system"""
def __init__(self, agent_id: str):
self.agent_id = agent_id
self.logger = AgentLogger(agent_id)
self.tracer = Tracer()
self.metrics = MetricsCollector()
self.cost_monitor = CostMonitor()
self.feedback = FeedbackCollector()
def monitor_request(self, user_id: str, input_text: str):
"""Monitor incoming request"""
self.logger.log_request(user_id, input_text)
self.metrics.increment_counter("total_requests")
return {
"trace_id": str(uuid.uuid4()),
"start_time": time.time()
}
def monitor_response(self,
user_id: str,
output_text: str,
context: dict):
"""Monitor response"""
execution_time = time.time() - context["start_time"]
self.logger.log_response(user_id, output_text, execution_time)
self.metrics.record_metric("response_time", execution_time)
self.metrics.increment_counter("successful_requests")
def monitor_tool_call(self, tool_name: str, parameters: dict, result: Any):
"""Monitor tool execution"""
self.logger.log_tool_call(tool_name, parameters, result)
self.metrics.increment_counter("tool_calls")
self.metrics.increment_counter(f"tool_calls_{tool_name}")
def monitor_cost(self,
user_id: str,
model: str,
tokens: int,
cost: float):
"""Monitor cost"""
self.cost_monitor.record_cost(user_id, cost, model, tokens)
self.metrics.increment_counter("total_cost_cents", int(cost * 100))
def monitor_error(self, error_type: str, error_message: str, context: dict):
"""Monitor error"""
self.logger.log_error(error_type, error_message, context)
self.metrics.increment_counter("failed_requests")
self.metrics.increment_counter(f"error_{error_type}")
def get_health_status(self) -> dict:
"""Get system health status"""
total = self.metrics.get_counter("total_requests")
successful = self.metrics.get_counter("successful_requests")
failed = self.metrics.get_counter("failed_requests")
success_rate = successful / total if total > 0 else 0
response_stats = self.metrics.get_stats("response_time")
avg_response_time = response_stats.get("avg", 0) if response_stats else 0
# Determine health
if success_rate < 0.9 or avg_response_time > 10:
health = "unhealthy"
elif success_rate < 0.95 or avg_response_time > 5:
health = "degraded"
else:
health = "healthy"
return {
"status": health,
"success_rate": success_rate,
"avg_response_time": avg_response_time,
"total_requests": total,
"failed_requests": failed,
"total_cost": self.cost_monitor.get_total_cost()
}
def generate_report(self) -> dict:
"""Generate monitoring report"""
return {
"agent_id": self.agent_id,
"timestamp": time.time(),
"health": self.get_health_status(),
"metrics": {
"response_time": self.metrics.get_stats("response_time"),
"requests": {
"total": self.metrics.get_counter("total_requests"),
"successful": self.metrics.get_counter("successful_requests"),
"failed": self.metrics.get_counter("failed_requests")
},
"tool_calls": self.metrics.get_counter("tool_calls")
},
"cost": {
"total": self.cost_monitor.get_total_cost(),
"alerts": self.cost_monitor.get_alerts()
},
"feedback": {
"average_rating": self.feedback.get_average_rating(),
"recent": self.feedback.get_recent_feedback(limit=5)
}
}
# Usage
monitor = AgentMonitor("agent-001")
# Monitor request
context = monitor.monitor_request("user123", "What is Python?")
# Monitor tool call
monitor.monitor_tool_call("search", {"query": "Python"}, "Results...")
# Monitor cost
monitor.monitor_cost("user123", "gpt-4", 500, 0.015)
# Monitor response
monitor.monitor_response("user123", "Python is...", context)
# Get health status
health = monitor.get_health_status()
print(f"System health: {health['status']}")
# Generate report
report = monitor.generate_report()
Alerting
Set up alerts for critical issues.
Alert Manager
class AlertManager:
"""Manage alerts and notifications"""
def __init__(self):
self.alert_rules = []
self.active_alerts = []
def add_rule(self,
name: str,
condition: Callable,
severity: str,
message: str):
"""Add alert rule"""
self.alert_rules.append({
"name": name,
"condition": condition,
"severity": severity,
"message": message
})
def check_alerts(self, metrics: dict):
"""Check all alert rules"""
new_alerts = []
for rule in self.alert_rules:
if rule["condition"](metrics):
alert = {
"name": rule["name"],
"severity": rule["severity"],
"message": rule["message"],
"timestamp": time.time(),
"metrics": metrics
}
new_alerts.append(alert)
self.trigger_alert(alert)
self.active_alerts.extend(new_alerts)
return new_alerts
def trigger_alert(self, alert: dict):
"""Trigger alert notification"""
print(f"\n🚨 ALERT [{alert['severity']}]: {alert['name']}")
print(f" {alert['message']}")
# In production, send to:
# - Email
# - Slack
# - PagerDuty
# - etc.
def get_active_alerts(self, severity: str = None) -> list:
"""Get active alerts"""
if severity:
return [a for a in self.active_alerts if a["severity"] == severity]
return self.active_alerts
# Setup alerts
alerts = AlertManager()
# High error rate
alerts.add_rule(
name="High Error Rate",
condition=lambda m: m.get("success_rate", 1) < 0.9,
severity="critical",
message="Success rate below 90%"
)
# Slow response time
alerts.add_rule(
name="Slow Response Time",
condition=lambda m: m.get("avg_response_time", 0) > 5,
severity="warning",
message="Average response time above 5 seconds"
)
# High cost
alerts.add_rule(
name="High Cost",
condition=lambda m: m.get("total_cost", 0) > 50,
severity="warning",
message="Total cost exceeded $50"
)
Best Practices
- Log everything: Requests, responses, errors, tool calls
- Use structured logging: JSON format for easy parsing
- Track key metrics: Response time, success rate, cost
- Set up alerts: Be notified of issues immediately
- Monitor costs: Track spending in real-time
- Collect feedback: Learn from users
- Create dashboards: Visualize metrics
- Trace requests: Follow execution flow
- Analyze trends: Look for patterns over time
- Act on insights: Use data to improve
Practice Exercises
Exercise 1: Add Circuit Breaker (Medium)
Task: Implement a circuit breaker that stops calling a failing tool.
Click to see solution
class CircuitBreaker:
def __init__(self, failure_threshold: int = 5):
self.failure_count = 0
self.threshold = failure_threshold
self.state = "closed" # closed, open, half-open
def call(self, func, *args):
if self.state == "open":
raise Exception("Circuit breaker is open")
try:
result = func(*args)
self.failure_count = 0
return result
except:
self.failure_count += 1
if self.failure_count >= self.threshold:
self.state = "open"
raise
Exercise 2: Build a Metrics Dashboard (Hard)
Task: Create a real-time dashboard showing agent metrics.
Click to see solution
from fastapi import FastAPI
from prometheus_client import make_asgi_app
app = FastAPI()
# Mount Prometheus metrics
metrics_app = make_asgi_app()
app.mount("/metrics", metrics_app)
@app.get("/dashboard")
def dashboard():
return {
"requests_total": requests_total._value.get(),
"avg_duration": sum(durations) / len(durations),
"error_rate": errors_total._value.get() / requests_total._value.get()
}
✅ Chapter 5 Summary
You’ve learned production-ready practices:
- Reliability: Input validation, guardrails, retries, and fallbacks
- Testing: Unit tests, integration tests, benchmarks, and evaluation metrics
- Monitoring: Logging, tracing, metrics, alerts, and feedback loops
These practices ensure your agents are safe, reliable, and maintainable in production environments.
Next Steps
Chapter 5 (Production-Ready Agents) is complete! You now understand reliability, testing, and monitoring. You’re ready to build production-grade agents that are safe, tested, and observable.
Would you like to continue with Chapter 6 (Specialized Agent Types)?
Coding Agents
Module 6: Learning Objectives
By the end of this module, you will:
- ✓ Build coding agents that analyze and generate code
- ✓ Create research agents with multi-source verification
- ✓ Implement task automation with workflow orchestration
- ✓ Design specialized agents for specific domains
- ✓ Integrate advanced capabilities into focused agents
Introduction to Coding Agents
Coding agents are specialized AI systems that understand, generate, modify, and debug code. They’re among the most powerful and practical agent applications.
What Makes Coding Agents Special?
Unique Capabilities:
- Understand code semantics and structure
- Generate syntactically correct code
- Refactor and optimize existing code
- Debug and fix errors
- Write tests and documentation
- Work across multiple programming languages
Key Challenges:
- Code must be syntactically correct
- Logic must be sound
- Must handle edge cases
- Need to understand context and dependencies
- Security vulnerabilities must be avoided
Types of Coding Agents
- Code Generation Agents: Write new code from specifications
- Code Review Agents: Analyze and suggest improvements
- Debugging Agents: Find and fix bugs
- Refactoring Agents: Improve code structure
- Testing Agents: Generate and run tests
- Documentation Agents: Write comments and docs
Code Understanding and Generation
Understanding Code Structure
import ast
from typing import Dict, List, Any
class CodeAnalyzer:
"""Analyze code structure and semantics"""
def __init__(self):
self.client = openai.OpenAI()
def parse_python_code(self, code: str) -> Dict[str, Any]:
"""Parse Python code into AST"""
try:
tree = ast.parse(code)
analysis = {
"functions": [],
"classes": [],
"imports": [],
"variables": [],
"complexity": 0
}
for node in ast.walk(tree):
if isinstance(node, ast.FunctionDef):
analysis["functions"].append({
"name": node.name,
"args": [arg.arg for arg in node.args.args],
"line": node.lineno,
"docstring": ast.get_docstring(node)
})
elif isinstance(node, ast.ClassDef):
methods = [
n.name for n in node.body
if isinstance(n, ast.FunctionDef)
]
analysis["classes"].append({
"name": node.name,
"methods": methods,
"line": node.lineno,
"docstring": ast.get_docstring(node)
})
elif isinstance(node, ast.Import):
for alias in node.names:
analysis["imports"].append(alias.name)
elif isinstance(node, ast.ImportFrom):
module = node.module or ""
for alias in node.names:
analysis["imports"].append(f"{module}.{alias.name}")
return analysis
except SyntaxError as e:
return {
"error": "Syntax error",
"message": str(e),
"line": e.lineno
}
def analyze_complexity(self, code: str) -> Dict[str, Any]:
"""Analyze code complexity"""
try:
tree = ast.parse(code)
complexity = {
"cyclomatic": 1, # Base complexity
"lines_of_code": len(code.split('\n')),
"num_functions": 0,
"num_classes": 0,
"max_nesting": 0
}
for node in ast.walk(tree):
# Count decision points for cyclomatic complexity
if isinstance(node, (ast.If, ast.While, ast.For, ast.ExceptHandler)):
complexity["cyclomatic"] += 1
elif isinstance(node, ast.FunctionDef):
complexity["num_functions"] += 1
elif isinstance(node, ast.ClassDef):
complexity["num_classes"] += 1
return complexity
except Exception as e:
return {"error": str(e)}
def extract_dependencies(self, code: str) -> List[str]:
"""Extract external dependencies"""
try:
tree = ast.parse(code)
dependencies = set()
for node in ast.walk(tree):
if isinstance(node, ast.Import):
for alias in node.names:
# Get top-level package
pkg = alias.name.split('.')[0]
dependencies.add(pkg)
elif isinstance(node, ast.ImportFrom):
if node.module:
pkg = node.module.split('.')[0]
dependencies.add(pkg)
# Filter out standard library
stdlib = {'os', 'sys', 'json', 're', 'time', 'datetime', 'math'}
external = dependencies - stdlib
return sorted(external)
except Exception as e:
return []
def understand_code_intent(self, code: str) -> str:
"""Use LLM to understand what code does"""
prompt = f"""Analyze this code and explain what it does:
```python
{code}
Provide:
- High-level purpose
- Key functionality
- Input/output
- Any notable patterns or techniques
Explanation:“”“
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.3
)
return response.choices[0].message.content
Usage
analyzer = CodeAnalyzer()
code = “”“ def fibonacci(n): if n <= 1: return n return fibonacci(n-1) + fibonacci(n-2)
class Calculator: def add(self, a, b): return a + b “”“
analysis = analyzer.parse_python_code(code) print(f“Functions: {[f[‘name’] for f in analysis[‘functions’]]}“) print(f“Classes: {[c[‘name’] for c in analysis[‘classes’]]}”)
complexity = analyzer.analyze_complexity(code) print(f“Cyclomatic complexity: {complexity[‘cyclomatic’]}“)
intent = analyzer.understand_code_intent(code) print(f“Intent: {intent}“)
### Generating Code from Specifications
```python
class CodeGenerator:
"""Generate code from natural language specifications"""
def __init__(self):
self.client = openai.OpenAI()
def generate_function(self,
description: str,
language: str = "python",
include_tests: bool = False) -> Dict[str, str]:
"""Generate function from description"""
prompt = f"""Generate a {language} function based on this description:
{description}
Requirements:
- Include type hints (if applicable)
- Add docstring with description, parameters, and return value
- Handle edge cases
- Include error handling
- Follow best practices
- Keep it simple and readable
{"Also generate unit tests for this function." if include_tests else ""}
Provide the code:"""
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.2
)
code = response.choices[0].message.content
# Extract code and tests
parts = self.extract_code_blocks(code)
return {
"code": parts.get("main", code),
"tests": parts.get("tests", "") if include_tests else None
}
def generate_class(self,
description: str,
methods: List[str] = None) -> str:
"""Generate class from description"""
methods_str = ""
if methods:
methods_str = f"\nMethods to implement:\n" + "\n".join(f"- {m}" for m in methods)
prompt = f"""Generate a Python class based on this description:
{description}{methods_str}
Requirements:
- Include __init__ method
- Add docstrings for class and methods
- Use type hints
- Follow PEP 8 style guide
- Include example usage in docstring
Provide the code:"""
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.2
)
return self.extract_code_blocks(response.choices[0].message.content)["main"]
def generate_from_signature(self, signature: str) -> str:
"""Generate function implementation from signature"""
prompt = f"""Implement this function:
```python
{signature}
pass
Provide a complete, working implementation with:
- Proper logic
- Error handling
- Edge case handling
- Comments for complex parts
Implementation:“”“
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.2
)
return self.extract_code_blocks(response.choices[0].message.content)["main"]
def extract_code_blocks(self, text: str) -> Dict[str, str]:
"""Extract code blocks from markdown"""
import re
# Find all code blocks
pattern = r'```(?:python)?\n(.*?)```'
blocks = re.findall(pattern, text, re.DOTALL)
if not blocks:
return {"main": text}
result = {"main": blocks[0]}
if len(blocks) > 1:
result["tests"] = blocks[1]
return result
Usage
generator = CodeGenerator()
Generate function
result = generator.generate_function( “Create a function that calculates the factorial of a number”, include_tests=True )
print(“Generated code:”) print(result[“code”])
if result[“tests”]: print(“\nGenerated tests:”) print(result[“tests”])
Generate class
class_code = generator.generate_class( “A simple cache that stores key-value pairs with expiration”, methods=[“set”, “get”, “delete”, “clear”] )
print(“\nGenerated class:”) print(class_code)
## Refactoring and Optimization
### Automated Refactoring
```python
class RefactoringAgent:
"""Refactor and improve code quality"""
def __init__(self):
self.client = openai.OpenAI()
def refactor_for_readability(self, code: str) -> Dict[str, str]:
"""Improve code readability"""
prompt = f"""Refactor this code for better readability:
```python
{code}
Apply these improvements:
- Better variable names
- Extract complex expressions
- Add comments
- Simplify logic
- Follow PEP 8
Provide:
- Refactored code
- List of changes made
Response:“”“
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.3
)
return self.parse_response(response.choices[0].message.content)
def optimize_performance(self, code: str) -> Dict[str, str]:
"""Optimize code for performance"""
prompt = f"""Optimize this code for better performance:
{code}
Consider:
- Algorithm complexity
- Data structure choices
- Unnecessary operations
- Caching opportunities
- Memory usage
Provide optimized code with explanation:“”“
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.3
)
return self.parse_response(response.choices[0].message.content)
def apply_design_pattern(self, code: str, pattern: str) -> Dict[str, str]:
"""Apply design pattern to code"""
prompt = f"""Refactor this code to use the {pattern} design pattern:
{code}
Explain:
- Why this pattern is appropriate
- How it improves the code
- What changed
Refactored code:“”“
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.3
)
return self.parse_response(response.choices[0].message.content)
def extract_method(self, code: str, lines: tuple) -> Dict[str, str]:
"""Extract method refactoring"""
prompt = f"""Extract lines {lines[0]}-{lines[1]} into a separate method:
{code}
Provide:
- New method with good name
- Updated original code
- Method signature
Result:“”“
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.2
)
return self.parse_response(response.choices[0].message.content)
Usage
refactorer = RefactoringAgent()
Improve readability
messy_code = “”“ def f(x,y,z): if x>0: if y>0: if z>0: return x+y+z return 0 “”“
result = refactorer.refactor_for_readability(messy_code) print(“Refactored:”, result[“code”])
Optimize performance
slow_code = “”“ def find_duplicates(items): duplicates = [] for i in range(len(items)): for j in range(i+1, len(items)): if items[i] == items[j] and items[i] not in duplicates: duplicates.append(items[i]) return duplicates “”“
result = refactorer.optimize_performance(slow_code) print(“Optimized:”, result[“code”])
## Test Generation
### Comprehensive Test Generation
```python
class TestGenerator:
"""Generate comprehensive unit tests"""
def __init__(self):
self.client = openai.OpenAI()
def generate_unit_tests(self, code: str, framework: str = "pytest") -> str:
"""Generate unit tests with full coverage"""
prompt = f"""Generate comprehensive {framework} tests for this code:
```python
{code}
Include tests for:
- Normal/happy path cases
- Edge cases (empty, None, boundaries)
- Error cases (invalid input, exceptions)
- Integration scenarios
- Fixtures and setup if needed
Use descriptive test names and add comments.
Tests:“”“
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.2
)
return self.extract_code(response.choices[0].message.content)
def generate_property_tests(self, code: str) -> str:
"""Generate property-based tests using Hypothesis"""
prompt = f"""Generate property-based tests using Hypothesis for:
{code}
Create tests that verify properties like:
- Invariants
- Idempotence
- Commutativity
- Round-trip properties
Tests:“”“
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.2
)
return self.extract_code(response.choices[0].message.content)
def generate_integration_tests(self, code: str, dependencies: List[str]) -> str:
"""Generate integration tests"""
deps_str = ", ".join(dependencies)
prompt = f"""Generate integration tests for this code that interacts with: {deps_str}
{code}
Include:
- Mocking external dependencies
- Testing interactions
- Setup and teardown
- Error scenarios
Tests:“”“
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.2
)
return self.extract_code(response.choices[0].message.content)
def extract_code(self, text: str) -> str:
"""Extract code from markdown"""
import re
pattern = r'```(?:python)?\n(.*?)```'
matches = re.findall(pattern, text, re.DOTALL)
return matches[0] if matches else text
Usage
test_gen = TestGenerator()
code_to_test = “”“ def divide(a: float, b: float) -> float: if b == 0: raise ValueError(“Cannot divide by zero”) return a / b “”“
tests = test_gen.generate_unit_tests(code_to_test) print(“Generated tests:”) print(tests)
## Debugging and Error Fixing
### Automated Debugging Agent
```python
class DebuggingAgent:
"""Find and fix bugs in code"""
def __init__(self):
self.client = openai.OpenAI()
self.sandbox = CodeExecutor() # From previous chapters
def debug_code(self, code: str, error_message: str = None) -> Dict:
"""Debug code and suggest fixes"""
# Try to execute and capture error if not provided
if not error_message:
result = self.sandbox.execute(code)
if not result["success"]:
error_message = result["output"]
prompt = f"""Debug this code:
```python
{code}
Error: {error_message}
Provide:
- Root cause analysis
- Fixed code
- Explanation of the fix
- How to prevent similar bugs
Response:“”“
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.2
)
return self.parse_debug_response(response.choices[0].message.content)
def find_logical_errors(self, code: str, expected_behavior: str) -> Dict:
"""Find logical errors (code runs but wrong output)"""
prompt = f"""This code runs without errors but produces wrong results:
{code}
Expected behavior: {expected_behavior}
Analyze:
- What’s the logical error?
- Why does it produce wrong results?
- How to fix it?
- Test cases to verify the fix
Analysis:“”“
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.3
)
return self.parse_debug_response(response.choices[0].message.content)
def suggest_improvements(self, code: str, issue: str) -> List[str]:
"""Suggest multiple ways to fix an issue"""
prompt = f"""Suggest 3 different ways to fix this issue:
Code:
{code}
Issue: {issue}
For each solution, provide:
- The fix
- Pros and cons
- When to use it
Solutions:“”“
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.5
)
return self.parse_solutions(response.choices[0].message.content)
def iterative_fix(self, code: str, max_attempts: int = 3) -> Dict:
"""Iteratively fix code until it works"""
for attempt in range(max_attempts):
# Try to execute
result = self.sandbox.execute(code)
if result["success"]:
return {
"success": True,
"code": code,
"attempts": attempt + 1
}
# Try to fix
fix_result = self.debug_code(code, result["output"])
code = fix_result["fixed_code"]
return {
"success": False,
"code": code,
"attempts": max_attempts,
"last_error": result["output"]
}
Usage
debugger = DebuggingAgent()
buggy_code = “”“ def calculate_average(numbers): total = 0 for num in numbers: total += num return total / len(numbers)
This will crash on empty list
result = calculate_average([]) “”“
fix = debugger.debug_code(buggy_code) print(“Root cause:”, fix[“root_cause”]) print(“Fixed code:”, fix[“fixed_code”])
## Repository-Level Operations
### Codebase Understanding
```python
from pathlib import Path
import json
class CodebaseAgent:
"""Understand and navigate entire codebases"""
def __init__(self, root_path: str):
self.root_path = Path(root_path)
self.index = {}
self.dependency_graph = {}
self.client = openai.OpenAI()
def index_codebase(self):
"""Index all Python files in codebase"""
print("Indexing codebase...")
for py_file in self.root_path.rglob("*.py"):
if "venv" in str(py_file) or ".git" in str(py_file):
continue
try:
with open(py_file) as f:
code = f.read()
analyzer = CodeAnalyzer()
analysis = analyzer.parse_python_code(code)
self.index[str(py_file.relative_to(self.root_path))] = {
"analysis": analysis,
"size": len(code),
"lines": len(code.split('\n'))
}
except Exception as e:
print(f"Error indexing {py_file}: {e}")
print(f"Indexed {len(self.index)} files")
def find_function_definition(self, function_name: str) -> List[Dict]:
"""Find where a function is defined"""
results = []
for file_path, data in self.index.items():
for func in data["analysis"].get("functions", []):
if func["name"] == function_name:
results.append({
"file": file_path,
"line": func["line"],
"signature": f"{func['name']}({', '.join(func['args'])})"
})
return results
def find_class_definition(self, class_name: str) -> List[Dict]:
"""Find where a class is defined"""
results = []
for file_path, data in self.index.items():
for cls in data["analysis"].get("classes", []):
if cls["name"] == class_name:
results.append({
"file": file_path,
"line": cls["line"],
"methods": cls["methods"]
})
return results
def find_usages(self, symbol: str) -> List[Dict]:
"""Find where a symbol is used"""
usages = []
for py_file in self.root_path.rglob("*.py"):
if "venv" in str(py_file):
continue
try:
with open(py_file) as f:
for i, line in enumerate(f, 1):
if symbol in line:
usages.append({
"file": str(py_file.relative_to(self.root_path)),
"line": i,
"content": line.strip()
})
except:
pass
return usages
def analyze_dependencies(self):
"""Build dependency graph"""
for file_path, data in self.index.items():
imports = data["analysis"].get("imports", [])
self.dependency_graph[file_path] = imports
def get_codebase_summary(self) -> Dict:
"""Get high-level codebase summary"""
total_files = len(self.index)
total_functions = sum(
len(data["analysis"].get("functions", []))
for data in self.index.values()
)
total_classes = sum(
len(data["analysis"].get("classes", []))
for data in self.index.values()
)
total_lines = sum(
data["lines"]
for data in self.index.values()
)
return {
"total_files": total_files,
"total_functions": total_functions,
"total_classes": total_classes,
"total_lines": total_lines,
"avg_lines_per_file": total_lines / total_files if total_files > 0 else 0
}
def explain_codebase(self) -> str:
"""Generate high-level explanation of codebase"""
summary = self.get_codebase_summary()
# Get file structure
files = list(self.index.keys())
prompt = f"""Explain this codebase structure:
Files: {len(files)}
Functions: {summary['total_functions']}
Classes: {summary['total_classes']}
Lines of code: {summary['total_lines']}
File structure:
{chr(10).join(files[:20])}
Provide:
1. What this codebase likely does
2. Main components/modules
3. Architecture pattern
4. Key areas of functionality
Explanation:"""
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.3
)
return response.choices[0].message.content
# Usage
codebase = CodebaseAgent("./my_project")
codebase.index_codebase()
# Find function
results = codebase.find_function_definition("process_data")
print(f"Found in: {results}")
# Get summary
summary = codebase.get_codebase_summary()
print(f"Codebase: {summary['total_files']} files, {summary['total_lines']} lines")
# Explain codebase
explanation = codebase.explain_codebase()
print(explanation)
Complete Coding Agent System
class CompleteCodingAgent:
"""Full-featured coding agent"""
def __init__(self):
self.analyzer = CodeAnalyzer()
self.generator = CodeGenerator()
self.refactorer = RefactoringAgent()
self.test_gen = TestGenerator()
self.debugger = DebuggingAgent()
self.client = openai.OpenAI()
def process_request(self, request: str, code: str = None, context: Dict = None) -> Dict:
"""Process any coding request"""
# Classify intent
intent = self.classify_intent(request)
if intent == "generate":
return self.handle_generation(request)
elif intent == "analyze":
return self.handle_analysis(code)
elif intent == "refactor":
return self.handle_refactoring(code, request)
elif intent == "test":
return self.handle_test_generation(code)
elif intent == "debug":
return self.handle_debugging(code, context)
elif intent == "explain":
return self.handle_explanation(code)
else:
return {"error": "Could not understand request"}
def handle_generation(self, request: str) -> Dict:
"""Handle code generation requests"""
code = self.generator.generate_function(request)
# Validate generated code
validation = self.analyzer.parse_python_code(code)
if "error" in validation:
# Try to fix
fixed = self.debugger.debug_code(code, validation["error"])
code = fixed["fixed_code"]
# Generate tests
tests = self.test_gen.generate_unit_tests(code)
return {
"type": "generation",
"code": code,
"tests": tests,
"validated": True
}
def handle_analysis(self, code: str) -> Dict:
"""Handle code analysis requests"""
# Parse structure
structure = self.analyzer.parse_python_code(code)
# Analyze complexity
complexity = self.analyzer.analyze_complexity(code)
# Get explanation
explanation = self.analyzer.understand_code_intent(code)
return {
"type": "analysis",
"structure": structure,
"complexity": complexity,
"explanation": explanation
}
def handle_refactoring(self, code: str, request: str) -> Dict:
"""Handle refactoring requests"""
if "performance" in request.lower():
result = self.refactorer.optimize_performance(code)
elif "readable" in request.lower():
result = self.refactorer.refactor_for_readability(code)
else:
result = self.refactorer.refactor_code(code)
return {
"type": "refactoring",
"original": code,
"refactored": result["code"],
"changes": result.get("changes", [])
}
def handle_test_generation(self, code: str) -> Dict:
"""Handle test generation requests"""
unit_tests = self.test_gen.generate_unit_tests(code)
return {
"type": "tests",
"code": code,
"tests": unit_tests
}
def handle_debugging(self, code: str, context: Dict) -> Dict:
"""Handle debugging requests"""
error_msg = context.get("error") if context else None
result = self.debugger.debug_code(code, error_msg)
return {
"type": "debugging",
"original": code,
"fixed": result["fixed_code"],
"explanation": result.get("explanation", "")
}
def handle_explanation(self, code: str) -> Dict:
"""Handle code explanation requests"""
explanation = self.analyzer.understand_code_intent(code)
structure = self.analyzer.parse_python_code(code)
return {
"type": "explanation",
"explanation": explanation,
"structure": structure
}
def classify_intent(self, request: str) -> str:
"""Classify user intent"""
request_lower = request.lower()
keywords = {
"generate": ["generate", "create", "write", "implement"],
"analyze": ["analyze", "understand", "explain what"],
"refactor": ["refactor", "improve", "optimize", "clean"],
"test": ["test", "unittest", "pytest"],
"debug": ["debug", "fix", "error", "bug"],
"explain": ["explain", "what does", "how does"]
}
for intent, words in keywords.items():
if any(word in request_lower for word in words):
return intent
return "unknown"
# Usage
agent = CompleteCodingAgent()
# Generate code
result = agent.process_request("Create a function to validate email addresses")
print("Generated code:")
print(result["code"])
print("\nTests:")
print(result["tests"])
# Analyze code
code = """
def fibonacci(n):
if n <= 1:
return n
return fibonacci(n-1) + fibonacci(n-2)
"""
result = agent.process_request("Analyze this code", code=code)
print("\nComplexity:", result["complexity"])
print("Explanation:", result["explanation"])
# Refactor code
result = agent.process_request("Optimize this code for performance", code=code)
print("\nRefactored:")
print(result["refactored"])
Best Practices for Coding Agents
1. Code Quality Checks
Always validate generated code:
- Syntax checking (AST parsing)
- Style checking (PEP 8, linting)
- Security scanning (bandit, safety)
- Type checking (mypy)
2. Testing Strategy
- Generate tests alongside code
- Run tests automatically
- Achieve high coverage
- Include edge cases
3. Context Awareness
- Understand existing codebase
- Match coding style
- Respect conventions
- Consider dependencies
4. Iterative Improvement
- Start with simple solution
- Refine based on feedback
- Test incrementally
- Document changes
5. Security Considerations
- Validate all inputs
- Avoid SQL injection
- Check for XSS vulnerabilities
- Use secure libraries
- Never expose secrets
6. Performance Optimization
- Profile before optimizing
- Choose right algorithms
- Consider memory usage
- Cache when appropriate
- Benchmark improvements
7. Documentation
- Generate docstrings
- Add inline comments
- Create README files
- Document APIs
- Explain complex logic
8. Version Control
- Commit frequently
- Write clear messages
- Use branches
- Review changes
- Tag releases
9. Collaboration
- Follow team standards
- Request code reviews
- Share knowledge
- Document decisions
- Communicate changes
10. Continuous Learning
- Learn from mistakes
- Study good code
- Stay updated
- Experiment safely
- Share learnings
Advanced Topics
Multi-Language Support
class MultiLanguageAgent:
"""Support multiple programming languages"""
def __init__(self):
self.client = openai.OpenAI()
self.supported_languages = ["python", "javascript", "java", "go", "rust"]
def generate_code(self, description: str, language: str) -> str:
"""Generate code in specified language"""
if language not in self.supported_languages:
raise ValueError(f"Unsupported language: {language}")
prompt = f"""Generate {language} code for:
{description}
Follow {language} best practices and conventions.
Code:"""
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.2
)
return response.choices[0].message.content
def translate_code(self, code: str, from_lang: str, to_lang: str) -> str:
"""Translate code between languages"""
prompt = f"""Translate this {from_lang} code to {to_lang}:
```{from_lang}
{code}
Maintain:
- Same functionality
- Idiomatic {to_lang} style
- Best practices
{to_lang} code:“”“
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.2
)
return response.choices[0].message.content
### Code Review Agent
```python
class CodeReviewAgent:
"""Automated code review"""
def __init__(self):
self.client = openai.OpenAI()
def review_code(self, code: str) -> Dict:
"""Comprehensive code review"""
prompt = f"""Review this code:
```python
{code}
Provide feedback on:
- Code quality (readability, maintainability)
- Potential bugs or issues
- Performance concerns
- Security vulnerabilities
- Best practice violations
- Suggestions for improvement
Rate each category 1-5 and provide specific feedback.
Review:“”“
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.3
)
return self.parse_review(response.choices[0].message.content)
def suggest_improvements(self, code: str) -> List[Dict]:
"""Suggest specific improvements"""
review = self.review_code(code)
improvements = []
for issue in review.get("issues", []):
improvements.append({
"issue": issue,
"suggestion": self.generate_fix(code, issue),
"priority": self.assess_priority(issue)
})
return improvements
## Next Steps
You now have comprehensive knowledge of coding agents! Next, we'll explore research agents that gather and synthesize information from multiple sources.
Research Agents
Introduction to Research Agents
Research agents are specialized AI systems that gather, analyze, and synthesize information from multiple sources to answer complex questions or investigate topics in depth.
What Makes Research Agents Unique?
Core Capabilities:
- Multi-source information gathering
- Source credibility assessment
- Information synthesis and summarization
- Citation management
- Fact verification
- Deep topic exploration
Key Challenges:
- Information overload
- Source reliability
- Conflicting information
- Bias detection
- Citation accuracy
- Staying current
Types of Research Agents
- Academic Research Agents: Literature reviews, paper analysis
- Market Research Agents: Competitive analysis, trends
- Investigative Agents: Deep dives, fact-checking
- News Aggregation Agents: Current events, monitoring
- Technical Research Agents: Documentation, specifications
Information Gathering Strategies
Multi-Source Search
from typing import List, Dict
import requests
from bs4 import BeautifulSoup
class MultiSourceSearcher:
"""Search across multiple sources"""
def __init__(self):
self.client = openai.OpenAI()
self.sources = {
"web": self.search_web,
"academic": self.search_academic,
"news": self.search_news,
"social": self.search_social
}
def search_all_sources(self, query: str, sources: List[str] = None) -> Dict:
"""Search across all specified sources"""
if sources is None:
sources = list(self.sources.keys())
results = {}
for source in sources:
if source in self.sources:
print(f"Searching {source}...")
results[source] = self.sources[source](query)
return results
def search_web(self, query: str) -> List[Dict]:
"""Search general web"""
# Using a search API (example with Google Custom Search)
api_key = os.getenv("GOOGLE_API_KEY")
search_engine_id = os.getenv("GOOGLE_SEARCH_ENGINE_ID")
url = "https://www.googleapis.com/customsearch/v1"
params = {
"key": api_key,
"cx": search_engine_id,
"q": query,
"num": 10
}
try:
response = requests.get(url, params=params, timeout=10)
response.raise_for_status()
data = response.json()
results = []
for item in data.get("items", []):
results.append({
"title": item["title"],
"url": item["link"],
"snippet": item["snippet"],
"source": "web"
})
return results
except Exception as e:
print(f"Web search error: {e}")
return []
def search_academic(self, query: str) -> List[Dict]:
"""Search academic sources (arXiv, PubMed, etc.)"""
# Example with arXiv API
url = "http://export.arxiv.org/api/query"
params = {
"search_query": f"all:{query}",
"start": 0,
"max_results": 10
}
try:
response = requests.get(url, params=params, timeout=10)
response.raise_for_status()
# Parse XML response
from xml.etree import ElementTree as ET
root = ET.fromstring(response.content)
results = []
for entry in root.findall("{http://www.w3.org/2005/Atom}entry"):
title = entry.find("{http://www.w3.org/2005/Atom}title").text
summary = entry.find("{http://www.w3.org/2005/Atom}summary").text
link = entry.find("{http://www.w3.org/2005/Atom}id").text
results.append({
"title": title.strip(),
"url": link,
"snippet": summary.strip()[:200],
"source": "academic"
})
return results
except Exception as e:
print(f"Academic search error: {e}")
return []
def search_news(self, query: str) -> List[Dict]:
"""Search news sources"""
# Example with News API
api_key = os.getenv("NEWS_API_KEY")
url = "https://newsapi.org/v2/everything"
params = {
"q": query,
"apiKey": api_key,
"pageSize": 10,
"sortBy": "relevancy"
}
try:
response = requests.get(url, params=params, timeout=10)
response.raise_for_status()
data = response.json()
results = []
for article in data.get("articles", []):
results.append({
"title": article["title"],
"url": article["url"],
"snippet": article["description"],
"source": "news",
"published": article.get("publishedAt")
})
return results
except Exception as e:
print(f"News search error: {e}")
return []
def search_social(self, query: str) -> List[Dict]:
"""Search social media (Twitter, Reddit, etc.)"""
# Example implementation for Reddit
url = f"https://www.reddit.com/search.json"
params = {
"q": query,
"limit": 10,
"sort": "relevance"
}
headers = {"User-Agent": "ResearchAgent/1.0"}
try:
response = requests.get(url, params=params, headers=headers, timeout=10)
response.raise_for_status()
data = response.json()
results = []
for post in data["data"]["children"]:
post_data = post["data"]
results.append({
"title": post_data["title"],
"url": f"https://reddit.com{post_data['permalink']}",
"snippet": post_data.get("selftext", "")[:200],
"source": "social",
"score": post_data.get("score", 0)
})
return results
except Exception as e:
print(f"Social search error: {e}")
return []
# Usage
searcher = MultiSourceSearcher()
results = searcher.search_all_sources("artificial intelligence agents")
for source, items in results.items():
print(f"\n{source.upper()} Results: {len(items)}")
for item in items[:3]:
print(f" - {item['title']}")
Deep Content Extraction
class ContentExtractor:
"""Extract and process content from sources"""
def __init__(self):
self.client = openai.OpenAI()
def extract_from_url(self, url: str) -> Dict:
"""Extract main content from URL"""
try:
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
}
response = requests.get(url, headers=headers, timeout=10)
response.raise_for_status()
soup = BeautifulSoup(response.content, 'html.parser')
# Remove script and style elements
for script in soup(["script", "style"]):
script.decompose()
# Get text
text = soup.get_text(separator='\n', strip=True)
# Extract metadata
title = soup.find('title')
title_text = title.string if title else ""
meta_desc = soup.find('meta', attrs={'name': 'description'})
description = meta_desc['content'] if meta_desc else ""
return {
"url": url,
"title": title_text,
"description": description,
"content": text[:10000], # Limit content
"word_count": len(text.split())
}
except Exception as e:
return {
"url": url,
"error": str(e)
}
def extract_key_points(self, content: str) -> List[str]:
"""Extract key points from content"""
prompt = f"""Extract the key points from this content:
{content[:4000]}
Provide 5-7 bullet points of the most important information:"""
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.3
)
points = response.choices[0].message.content.strip().split('\n')
return [p.strip('- ').strip() for p in points if p.strip()]
def extract_quotes(self, content: str, topic: str) -> List[Dict]:
"""Extract relevant quotes"""
prompt = f"""Find relevant quotes about "{topic}" from this content:
{content[:4000]}
Provide 3-5 direct quotes with context:"""
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.3
)
# Parse quotes
quotes_text = response.choices[0].message.content
# Simple parsing - in production, use more robust method
quotes = []
for line in quotes_text.split('\n'):
if line.strip().startswith('"'):
quotes.append({"quote": line.strip(), "context": ""})
return quotes
# Usage
extractor = ContentExtractor()
# Extract content
content = extractor.extract_from_url("https://example.com/article")
print(f"Title: {content['title']}")
print(f"Words: {content['word_count']}")
# Extract key points
key_points = extractor.extract_key_points(content['content'])
for point in key_points:
print(f" • {point}")
Source Verification
Credibility Assessment
class SourceVerifier:
"""Verify source credibility and reliability"""
def __init__(self):
self.client = openai.OpenAI()
self.trusted_domains = {
"academic": [".edu", ".gov", "arxiv.org", "pubmed.gov"],
"news": ["reuters.com", "apnews.com", "bbc.com"],
"tech": ["github.com", "stackoverflow.com"]
}
def assess_credibility(self, url: str, content: str = None) -> Dict:
"""Assess source credibility"""
from urllib.parse import urlparse
domain = urlparse(url).netloc
# Check against trusted domains
trust_level = "unknown"
for category, domains in self.trusted_domains.items():
if any(trusted in domain for trusted in domains):
trust_level = "high"
break
# Analyze content if provided
content_score = None
if content:
content_score = self.analyze_content_quality(content)
return {
"url": url,
"domain": domain,
"trust_level": trust_level,
"content_quality": content_score,
"is_trusted": trust_level == "high"
}
def analyze_content_quality(self, content: str) -> Dict:
"""Analyze content quality indicators"""
prompt = f"""Analyze the quality and credibility of this content:
{content[:2000]}
Rate (1-5) on:
1. Factual accuracy (based on claims made)
2. Objectivity (bias level)
3. Citation quality (references provided)
4. Writing quality (clarity, professionalism)
5. Depth of analysis
Provide scores and brief explanation:"""
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.3
)
return self.parse_quality_scores(response.choices[0].message.content)
def cross_reference(self, claim: str, sources: List[Dict]) -> Dict:
"""Cross-reference a claim across sources"""
confirmations = 0
contradictions = 0
for source in sources:
result = self.check_claim_in_source(claim, source.get("content", ""))
if result == "confirms":
confirmations += 1
elif result == "contradicts":
contradictions += 1
return {
"claim": claim,
"confirmations": confirmations,
"contradictions": contradictions,
"confidence": confirmations / len(sources) if sources else 0
}
def check_claim_in_source(self, claim: str, content: str) -> str:
"""Check if source confirms, contradicts, or is neutral on claim"""
prompt = f"""Does this content confirm, contradict, or neither regarding this claim?
Claim: {claim}
Content: {content[:1000]}
Answer with just: confirms, contradicts, or neutral"""
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.1
)
return response.choices[0].message.content.strip().lower()
# Usage
verifier = SourceVerifier()
# Assess credibility
credibility = verifier.assess_credibility(
"https://arxiv.org/abs/2023.12345",
"This paper presents..."
)
print(f"Trust level: {credibility['trust_level']}")
# Cross-reference claim
claim = "AI agents can autonomously complete complex tasks"
sources = [
{"content": "Research shows AI agents are capable of..."},
{"content": "Studies indicate autonomous agents can..."}
]
verification = verifier.cross_reference(claim, sources)
print(f"Confidence: {verification['confidence']:.0%}")
Synthesis and Summarization
Information Synthesis
class InformationSynthesizer:
"""Synthesize information from multiple sources"""
def __init__(self):
self.client = openai.OpenAI()
def synthesize_sources(self,
query: str,
sources: List[Dict],
style: str = "comprehensive") -> str:
"""Synthesize information from multiple sources"""
# Prepare source summaries
source_texts = []
for i, source in enumerate(sources[:10], 1): # Limit to 10 sources
source_texts.append(f"""
Source {i}: {source.get('title', 'Unknown')}
URL: {source.get('url', 'N/A')}
Content: {source.get('snippet', source.get('content', ''))[:500]}
""")
sources_combined = "\n---\n".join(source_texts)
style_instructions = {
"comprehensive": "Provide a detailed, thorough analysis",
"concise": "Provide a brief, focused summary",
"academic": "Use formal, academic tone with citations",
"casual": "Use conversational, accessible language"
}
prompt = f"""Synthesize information about: {query}
Sources:
{sources_combined}
{style_instructions.get(style, style_instructions['comprehensive'])}.
Requirements:
- Integrate information from multiple sources
- Identify common themes and patterns
- Note any contradictions
- Cite sources [1], [2], etc.
- Provide balanced perspective
Synthesis:"""
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.4,
max_tokens=2000
)
synthesis = response.choices[0].message.content
# Add source list
source_list = "\n\nSources:\n"
for i, source in enumerate(sources[:10], 1):
source_list += f"[{i}] {source.get('title', 'Unknown')} - {source.get('url', 'N/A')}\n"
return synthesis + source_list
def identify_themes(self, sources: List[Dict]) -> List[Dict]:
"""Identify common themes across sources"""
# Combine content
combined_content = "\n\n".join([
s.get('snippet', s.get('content', ''))[:500]
for s in sources[:20]
])
prompt = f"""Identify the main themes in these sources:
{combined_content}
List 5-7 key themes with:
- Theme name
- Brief description
- How many sources mention it
Themes:"""
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.3
)
return self.parse_themes(response.choices[0].message.content)
def find_contradictions(self, sources: List[Dict]) -> List[Dict]:
"""Find contradictions between sources"""
contradictions = []
# Compare sources pairwise (simplified)
for i in range(min(5, len(sources))):
for j in range(i+1, min(5, len(sources))):
source_a = sources[i]
source_b = sources[j]
prompt = f"""Do these sources contradict each other?
Source A: {source_a.get('snippet', '')[:300]}
Source B: {source_b.get('snippet', '')[:300]}
If yes, explain the contradiction. If no, say "no contradiction".
Analysis:"""
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.2
)
result = response.choices[0].message.content
if "no contradiction" not in result.lower():
contradictions.append({
"source_a": source_a.get('title'),
"source_b": source_b.get('title'),
"contradiction": result
})
return contradictions
# Usage
synthesizer = InformationSynthesizer()
sources = [
{"title": "AI Agents Overview", "url": "...", "snippet": "AI agents are..."},
{"title": "Agent Architectures", "url": "...", "snippet": "Modern agents use..."},
# ... more sources
]
# Synthesize
synthesis = synthesizer.synthesize_sources(
"What are AI agents?",
sources,
style="comprehensive"
)
print(synthesis)
# Identify themes
themes = synthesizer.identify_themes(sources)
for theme in themes:
print(f"Theme: {theme}")
Citation Management
Automatic Citation Generation
class CitationManager:
"""Manage citations and references"""
def __init__(self):
self.citations = []
self.citation_style = "APA" # APA, MLA, Chicago
def add_citation(self, source: Dict) -> int:
"""Add source and return citation number"""
self.citations.append(source)
return len(self.citations)
def format_citation(self, source: Dict, style: str = None) -> str:
"""Format citation in specified style"""
style = style or self.citation_style
if style == "APA":
return self.format_apa(source)
elif style == "MLA":
return self.format_mla(source)
elif style == "Chicago":
return self.format_chicago(source)
else:
return self.format_simple(source)
def format_apa(self, source: Dict) -> str:
"""Format in APA style"""
author = source.get('author', 'Unknown')
year = source.get('year', 'n.d.')
title = source.get('title', 'Untitled')
url = source.get('url', '')
return f"{author}. ({year}). {title}. Retrieved from {url}"
def format_mla(self, source: Dict) -> str:
"""Format in MLA style"""
author = source.get('author', 'Unknown')
title = source.get('title', 'Untitled')
website = source.get('website', 'Web')
url = source.get('url', '')
return f'{author}. "{title}." {website}. {url}.'
def format_simple(self, source: Dict) -> str:
"""Simple format"""
title = source.get('title', 'Untitled')
url = source.get('url', '')
return f"{title} - {url}"
def generate_bibliography(self) -> str:
"""Generate full bibliography"""
bibliography = "References:\n\n"
for i, source in enumerate(self.citations, 1):
citation = self.format_citation(source)
bibliography += f"{i}. {citation}\n"
return bibliography
def inline_cite(self, text: str, citation_num: int) -> str:
"""Add inline citation to text"""
return f"{text} [{citation_num}]"
# Usage
citations = CitationManager()
# Add sources
source1 = {
"author": "Smith, J.",
"year": "2023",
"title": "Understanding AI Agents",
"url": "https://example.com/article"
}
cite_num = citations.add_citation(source1)
# Use in text
text = citations.inline_cite("AI agents are autonomous systems", cite_num)
print(text) # "AI agents are autonomous systems [1]"
# Generate bibliography
print(citations.generate_bibliography())
Complete Research Agent
class ResearchAgent:
"""Complete research agent system"""
def __init__(self):
self.searcher = MultiSourceSearcher()
self.extractor = ContentExtractor()
self.verifier = SourceVerifier()
self.synthesizer = InformationSynthesizer()
self.citations = CitationManager()
self.client = openai.OpenAI()
def research(self,
query: str,
depth: str = "medium",
sources: List[str] = None) -> Dict:
"""Conduct comprehensive research"""
print(f"🔍 Researching: {query}\n")
# 1. Search multiple sources
print("📚 Gathering sources...")
search_results = self.searcher.search_all_sources(query, sources)
all_sources = []
for source_type, results in search_results.items():
all_sources.extend(results)
print(f"Found {len(all_sources)} sources\n")
# 2. Extract and verify content
print("📖 Extracting content...")
verified_sources = []
for source in all_sources[:20]: # Limit processing
# Extract content
if 'content' not in source:
content_data = self.extractor.extract_from_url(source['url'])
source['content'] = content_data.get('content', source.get('snippet', ''))
# Verify credibility
credibility = self.verifier.assess_credibility(
source['url'],
source.get('content', '')
)
if credibility['is_trusted'] or credibility['trust_level'] != 'low':
source['credibility'] = credibility
verified_sources.append(source)
# Add citation
cite_num = self.citations.add_citation(source)
source['citation_num'] = cite_num
print(f"Verified {len(verified_sources)} sources\n")
# 3. Synthesize information
print("✍️ Synthesizing findings...")
synthesis = self.synthesizer.synthesize_sources(
query,
verified_sources,
style="comprehensive" if depth == "deep" else "concise"
)
# 4. Identify themes
themes = self.synthesizer.identify_themes(verified_sources)
# 5. Find contradictions
contradictions = self.synthesizer.find_contradictions(verified_sources)
# 6. Generate bibliography
bibliography = self.citations.generate_bibliography()
return {
"query": query,
"synthesis": synthesis,
"themes": themes,
"contradictions": contradictions,
"sources": verified_sources,
"bibliography": bibliography,
"source_count": len(verified_sources)
}
def deep_dive(self, topic: str, subtopics: List[str] = None) -> Dict:
"""Deep research on topic with subtopics"""
if not subtopics:
# Generate subtopics
subtopics = self.generate_subtopics(topic)
results = {
"topic": topic,
"subtopics": {}
}
for subtopic in subtopics:
print(f"\n📌 Researching subtopic: {subtopic}")
result = self.research(f"{topic}: {subtopic}", depth="medium")
results["subtopics"][subtopic] = result
# Create overall synthesis
print("\n🔗 Creating overall synthesis...")
overall = self.synthesize_deep_dive(topic, results["subtopics"])
results["overall_synthesis"] = overall
return results
def generate_subtopics(self, topic: str) -> List[str]:
"""Generate relevant subtopics"""
prompt = f"""Generate 5 key subtopics for researching: {topic}
Subtopics should:
- Cover different aspects
- Be specific and focused
- Be researchable
List:"""
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.5
)
subtopics = response.choices[0].message.content.strip().split('\n')
return [s.strip('- 0123456789.').strip() for s in subtopics if s.strip()]
def synthesize_deep_dive(self, topic: str, subtopic_results: Dict) -> str:
"""Synthesize results from deep dive"""
# Combine all syntheses
combined = f"# Comprehensive Research: {topic}\n\n"
for subtopic, result in subtopic_results.items():
combined += f"## {subtopic}\n\n"
combined += result['synthesis'] + "\n\n"
return combined
def fact_check(self, claim: str) -> Dict:
"""Fact-check a specific claim"""
print(f"🔎 Fact-checking: {claim}\n")
# Search for information about the claim
results = self.research(claim, depth="medium")
# Cross-reference
verification = self.verifier.cross_reference(
claim,
results['sources']
)
# Determine verdict
if verification['confidence'] > 0.7:
verdict = "Likely True"
elif verification['confidence'] < 0.3:
verdict = "Likely False"
else:
verdict = "Unclear/Mixed Evidence"
return {
"claim": claim,
"verdict": verdict,
"confidence": verification['confidence'],
"confirmations": verification['confirmations'],
"contradictions": verification['contradictions'],
"sources": results['sources'][:5],
"explanation": results['synthesis']
}
# Usage
agent = ResearchAgent()
# Basic research
result = agent.research("What are the latest developments in AI agents?")
print(result['synthesis'])
print(f"\nSources: {result['source_count']}")
# Deep dive
deep_result = agent.deep_dive(
"AI Agent Architectures",
subtopics=["ReAct Pattern", "Memory Systems", "Tool Use"]
)
# Fact check
fact_result = agent.fact_check("AI agents can autonomously write production code")
print(f"Verdict: {fact_result['verdict']}")
print(f"Confidence: {fact_result['confidence']:.0%}")
Best Practices
- Multi-source verification: Never rely on single source
- Assess credibility: Check source reliability
- Cite properly: Always attribute information
- Check recency: Ensure information is current
- Cross-reference: Verify claims across sources
- Note contradictions: Highlight conflicting information
- Maintain objectivity: Present balanced view
- Track sources: Keep detailed records
- Update regularly: Refresh research periodically
- Human review: Critical research needs expert review
Next Steps
You now have comprehensive knowledge of research agents! Next, we’ll explore task automation agents that handle repetitive workflows.
Task Automation Agents
Introduction to Task Automation
Task automation agents handle repetitive workflows, orchestrate complex processes, and integrate with existing tools to save time and reduce errors.
What Makes Automation Agents Special?
Core Capabilities:
- Workflow orchestration
- Event-driven triggers
- Integration with multiple tools
- Scheduled operations
- Error handling and recovery
- State management across tasks
Key Benefits:
- Eliminate repetitive work
- Reduce human error
- 24/7 operation
- Consistent execution
- Scalable processing
- Audit trails
Types of Automation Agents
- Workflow Agents: Multi-step process automation
- Scheduling Agents: Time-based task execution
- Integration Agents: Connect different systems
- Monitoring Agents: Watch and respond to events
- Data Processing Agents: ETL and transformation
Workflow Orchestration
Building Workflow Engine
from dataclasses import dataclass
from typing import List, Dict, Callable, Any
from enum import Enum
import time
class TaskStatus(Enum):
PENDING = "pending"
RUNNING = "running"
COMPLETED = "completed"
FAILED = "failed"
SKIPPED = "skipped"
@dataclass
class Task:
"""Single task in workflow"""
id: str
name: str
action: Callable
params: Dict[str, Any]
dependencies: List[str] = None
retry_count: int = 3
timeout: int = 300
status: TaskStatus = TaskStatus.PENDING
result: Any = None
error: str = None
class WorkflowEngine:
"""Orchestrate complex workflows"""
def __init__(self):
self.tasks = {}
self.execution_log = []
def add_task(self, task: Task):
"""Add task to workflow"""
self.tasks[task.id] = task
def execute_workflow(self) -> Dict:
"""Execute all tasks respecting dependencies"""
print("🚀 Starting workflow execution\n")
completed = set()
failed = set()
while len(completed) + len(failed) < len(self.tasks):
# Find tasks ready to execute
ready_tasks = self.get_ready_tasks(completed, failed)
if not ready_tasks:
# Check if we're stuck
pending = [t for t in self.tasks.values() if t.status == TaskStatus.PENDING]
if pending:
print("⚠️ Workflow stuck - circular dependencies or all tasks failed")
break
else:
break
# Execute ready tasks
for task in ready_tasks:
result = self.execute_task(task)
if result['success']:
completed.add(task.id)
else:
failed.add(task.id)
return self.generate_report(completed, failed)
def get_ready_tasks(self, completed: set, failed: set) -> List[Task]:
"""Get tasks ready to execute"""
ready = []
for task in self.tasks.values():
if task.status != TaskStatus.PENDING:
continue
# Check dependencies
if task.dependencies:
deps_met = all(dep in completed for dep in task.dependencies)
deps_failed = any(dep in failed for dep in task.dependencies)
if deps_failed:
task.status = TaskStatus.SKIPPED
task.error = "Dependency failed"
continue
if not deps_met:
continue
ready.append(task)
return ready
def execute_task(self, task: Task) -> Dict:
"""Execute single task with retry logic"""
print(f"▶️ Executing: {task.name}")
task.status = TaskStatus.RUNNING
for attempt in range(task.retry_count):
try:
# Execute task action
start_time = time.time()
result = task.action(**task.params)
execution_time = time.time() - start_time
# Success
task.status = TaskStatus.COMPLETED
task.result = result
log_entry = {
"task_id": task.id,
"task_name": task.name,
"status": "success",
"execution_time": execution_time,
"attempt": attempt + 1
}
self.execution_log.append(log_entry)
print(f"✅ Completed: {task.name} ({execution_time:.2f}s)\n")
return {"success": True, "result": result}
except Exception as e:
error_msg = str(e)
print(f"❌ Attempt {attempt + 1} failed: {error_msg}")
if attempt < task.retry_count - 1:
wait_time = 2 ** attempt # Exponential backoff
print(f"⏳ Retrying in {wait_time}s...")
time.sleep(wait_time)
else:
# Final failure
task.status = TaskStatus.FAILED
task.error = error_msg
log_entry = {
"task_id": task.id,
"task_name": task.name,
"status": "failed",
"error": error_msg,
"attempts": task.retry_count
}
self.execution_log.append(log_entry)
print(f"💥 Failed: {task.name}\n")
return {"success": False, "error": error_msg}
def generate_report(self, completed: set, failed: set) -> Dict:
"""Generate execution report"""
total = len(self.tasks)
skipped = sum(1 for t in self.tasks.values() if t.status == TaskStatus.SKIPPED)
report = {
"total_tasks": total,
"completed": len(completed),
"failed": len(failed),
"skipped": skipped,
"success_rate": len(completed) / total if total > 0 else 0,
"execution_log": self.execution_log
}
print("=" * 50)
print("WORKFLOW EXECUTION REPORT")
print("=" * 50)
print(f"Total Tasks: {total}")
print(f"Completed: {len(completed)}")
print(f"Failed: {len(failed)}")
print(f"Skipped: {skipped}")
print(f"Success Rate: {report['success_rate']:.1%}")
print("=" * 50)
return report
# Usage
workflow = WorkflowEngine()
# Define tasks
def fetch_data(source):
print(f" Fetching from {source}...")
time.sleep(1)
return {"data": f"Data from {source}"}
def process_data(data):
print(f" Processing data...")
time.sleep(1)
return {"processed": True}
def save_results(data):
print(f" Saving results...")
time.sleep(1)
return {"saved": True}
# Add tasks
workflow.add_task(Task(
id="fetch",
name="Fetch Data",
action=fetch_data,
params={"source": "API"}
))
workflow.add_task(Task(
id="process",
name="Process Data",
action=process_data,
params={"data": {}},
dependencies=["fetch"]
))
workflow.add_task(Task(
id="save",
name="Save Results",
action=save_results,
params={"data": {}},
dependencies=["process"]
))
# Execute
report = workflow.execute_workflow()
Parallel Workflow Execution
import asyncio
from concurrent.futures import ThreadPoolExecutor
class ParallelWorkflowEngine(WorkflowEngine):
"""Execute independent tasks in parallel"""
def __init__(self, max_workers: int = 4):
super().__init__()
self.max_workers = max_workers
self.executor = ThreadPoolExecutor(max_workers=max_workers)
async def execute_workflow_async(self) -> Dict:
"""Execute workflow with parallel execution"""
print("🚀 Starting parallel workflow execution\n")
completed = set()
failed = set()
while len(completed) + len(failed) < len(self.tasks):
# Get ready tasks
ready_tasks = self.get_ready_tasks(completed, failed)
if not ready_tasks:
break
# Execute tasks in parallel
tasks_futures = [
self.execute_task_async(task)
for task in ready_tasks
]
results = await asyncio.gather(*tasks_futures)
# Update completed/failed
for task, result in zip(ready_tasks, results):
if result['success']:
completed.add(task.id)
else:
failed.add(task.id)
return self.generate_report(completed, failed)
async def execute_task_async(self, task: Task) -> Dict:
"""Execute task asynchronously"""
loop = asyncio.get_event_loop()
return await loop.run_in_executor(
self.executor,
self.execute_task,
task
)
# Usage
async def main():
workflow = ParallelWorkflowEngine(max_workers=3)
# Add independent tasks that can run in parallel
for i in range(5):
workflow.add_task(Task(
id=f"task_{i}",
name=f"Task {i}",
action=lambda x: time.sleep(1) or f"Result {x}",
params={"x": i}
))
report = await workflow.execute_workflow_async()
# Run
# asyncio.run(main())
Scheduled Operations
Task Scheduler
from datetime import datetime, timedelta
import schedule
import threading
class TaskScheduler:
"""Schedule tasks to run at specific times"""
def __init__(self):
self.scheduled_tasks = []
self.running = False
self.thread = None
def schedule_task(self,
task: Callable,
schedule_type: str,
time_spec: str = None,
**kwargs):
"""Schedule a task"""
if schedule_type == "daily":
job = schedule.every().day.at(time_spec).do(task, **kwargs)
elif schedule_type == "hourly":
job = schedule.every().hour.do(task, **kwargs)
elif schedule_type == "interval":
minutes = int(time_spec)
job = schedule.every(minutes).minutes.do(task, **kwargs)
elif schedule_type == "weekly":
day, time = time_spec.split()
job = getattr(schedule.every(), day.lower()).at(time).do(task, **kwargs)
else:
raise ValueError(f"Unknown schedule type: {schedule_type}")
self.scheduled_tasks.append({
"job": job,
"task": task.__name__,
"schedule": schedule_type,
"time_spec": time_spec
})
print(f"📅 Scheduled: {task.__name__} - {schedule_type} {time_spec or ''}")
def start(self):
"""Start scheduler"""
self.running = True
self.thread = threading.Thread(target=self._run_scheduler)
self.thread.daemon = True
self.thread.start()
print("🕐 Scheduler started")
def stop(self):
"""Stop scheduler"""
self.running = False
if self.thread:
self.thread.join()
print("🛑 Scheduler stopped")
def _run_scheduler(self):
"""Run scheduler loop"""
while self.running:
schedule.run_pending()
time.sleep(1)
def list_scheduled_tasks(self) -> List[Dict]:
"""List all scheduled tasks"""
return self.scheduled_tasks
# Usage
scheduler = TaskScheduler()
def backup_database():
print(f"💾 Running database backup at {datetime.now()}")
# Backup logic here
def send_report():
print(f"📊 Sending daily report at {datetime.now()}")
# Report logic here
def cleanup_temp_files():
print(f"🧹 Cleaning temp files at {datetime.now()}")
# Cleanup logic here
# Schedule tasks
scheduler.schedule_task(backup_database, "daily", "02:00")
scheduler.schedule_task(send_report, "daily", "09:00")
scheduler.schedule_task(cleanup_temp_files, "interval", "60") # Every hour
# Start scheduler
scheduler.start()
# Keep running
# try:
# while True:
# time.sleep(1)
# except KeyboardInterrupt:
# scheduler.stop()
Cron-Style Scheduling
from crontab import CronTab
class CronScheduler:
"""Cron-style task scheduling"""
def __init__(self):
self.cron = CronTab(user=True)
def add_cron_job(self,
command: str,
schedule: str,
comment: str = None):
"""Add cron job
Schedule format: "minute hour day month weekday"
Examples:
- "0 2 * * *" - Daily at 2 AM
- "*/15 * * * *" - Every 15 minutes
- "0 9 * * 1-5" - Weekdays at 9 AM
"""
job = self.cron.new(command=command, comment=comment)
job.setall(schedule)
self.cron.write()
print(f"✅ Added cron job: {comment or command}")
print(f" Schedule: {schedule}")
def list_jobs(self) -> List[Dict]:
"""List all cron jobs"""
jobs = []
for job in self.cron:
jobs.append({
"command": job.command,
"schedule": str(job.slices),
"comment": job.comment,
"enabled": job.is_enabled()
})
return jobs
def remove_job(self, comment: str):
"""Remove job by comment"""
self.cron.remove_all(comment=comment)
self.cron.write()
print(f"🗑️ Removed job: {comment}")
# Usage
# cron = CronScheduler()
# cron.add_cron_job(
# "python /path/to/backup.py",
# "0 2 * * *",
# "Daily backup"
# )
Event-Driven Triggers
Event Listener System
from typing import Callable, Dict, List
from queue import Queue
import threading
class EventType(Enum):
FILE_CREATED = "file_created"
FILE_MODIFIED = "file_modified"
FILE_DELETED = "file_deleted"
API_CALL = "api_call"
THRESHOLD_EXCEEDED = "threshold_exceeded"
ERROR_OCCURRED = "error_occurred"
@dataclass
class Event:
"""Event data"""
type: EventType
data: Dict[str, Any]
timestamp: float = None
def __post_init__(self):
if self.timestamp is None:
self.timestamp = time.time()
class EventDrivenAgent:
"""Agent that responds to events"""
def __init__(self):
self.handlers = {}
self.event_queue = Queue()
self.running = False
self.thread = None
def register_handler(self, event_type: EventType, handler: Callable):
"""Register event handler"""
if event_type not in self.handlers:
self.handlers[event_type] = []
self.handlers[event_type].append(handler)
print(f"📝 Registered handler for {event_type.value}")
def emit_event(self, event: Event):
"""Emit an event"""
self.event_queue.put(event)
def start(self):
"""Start event processing"""
self.running = True
self.thread = threading.Thread(target=self._process_events)
self.thread.daemon = True
self.thread.start()
print("🎯 Event processor started")
def stop(self):
"""Stop event processing"""
self.running = False
if self.thread:
self.thread.join()
print("🛑 Event processor stopped")
def _process_events(self):
"""Process events from queue"""
while self.running:
try:
event = self.event_queue.get(timeout=1)
self._handle_event(event)
except:
continue
def _handle_event(self, event: Event):
"""Handle single event"""
print(f"⚡ Event: {event.type.value}")
handlers = self.handlers.get(event.type, [])
for handler in handlers:
try:
handler(event)
except Exception as e:
print(f"❌ Handler error: {e}")
# Usage
agent = EventDrivenAgent()
# Register handlers
def on_file_created(event: Event):
print(f" 📄 File created: {event.data['filename']}")
# Process new file
def on_threshold_exceeded(event: Event):
print(f" ⚠️ Threshold exceeded: {event.data['metric']} = {event.data['value']}")
# Send alert
def on_error(event: Event):
print(f" 💥 Error occurred: {event.data['error']}")
# Log and notify
agent.register_handler(EventType.FILE_CREATED, on_file_created)
agent.register_handler(EventType.THRESHOLD_EXCEEDED, on_threshold_exceeded)
agent.register_handler(EventType.ERROR_OCCURRED, on_error)
# Start processing
agent.start()
# Emit events
agent.emit_event(Event(
type=EventType.FILE_CREATED,
data={"filename": "data.csv"}
))
agent.emit_event(Event(
type=EventType.THRESHOLD_EXCEEDED,
data={"metric": "cpu_usage", "value": 95}
))
File System Watcher
from watchdog.observers import Observer
from watchdog.events import FileSystemEventHandler
class FileWatcher(FileSystemEventHandler):
"""Watch file system for changes"""
def __init__(self, agent: EventDrivenAgent):
self.agent = agent
def on_created(self, event):
"""File created"""
if not event.is_directory:
self.agent.emit_event(Event(
type=EventType.FILE_CREATED,
data={"path": event.src_path}
))
def on_modified(self, event):
"""File modified"""
if not event.is_directory:
self.agent.emit_event(Event(
type=EventType.FILE_MODIFIED,
data={"path": event.src_path}
))
def on_deleted(self, event):
"""File deleted"""
if not event.is_directory:
self.agent.emit_event(Event(
type=EventType.FILE_DELETED,
data={"path": event.src_path}
))
def start_file_watcher(path: str, agent: EventDrivenAgent):
"""Start watching directory"""
event_handler = FileWatcher(agent)
observer = Observer()
observer.schedule(event_handler, path, recursive=True)
observer.start()
print(f"👁️ Watching: {path}")
return observer
# Usage
# observer = start_file_watcher("/path/to/watch", agent)
Integration with Existing Tools
Tool Integration Framework
class ToolIntegration:
"""Integrate with external tools"""
def __init__(self):
self.tools = {}
def register_tool(self, name: str, connector: Callable):
"""Register tool connector"""
self.tools[name] = connector
print(f"🔌 Registered tool: {name}")
def execute_tool(self, name: str, action: str, **params) -> Dict:
"""Execute tool action"""
if name not in self.tools:
return {"success": False, "error": f"Tool not found: {name}"}
try:
result = self.tools[name](action, **params)
return {"success": True, "result": result}
except Exception as e:
return {"success": False, "error": str(e)}
# Example integrations
def slack_connector(action: str, **params):
"""Slack integration"""
if action == "send_message":
channel = params.get("channel")
message = params.get("message")
# Send to Slack API
print(f"📱 Slack: Sending to {channel}: {message}")
return {"sent": True}
elif action == "get_messages":
channel = params.get("channel")
# Get from Slack API
return {"messages": []}
def email_connector(action: str, **params):
"""Email integration"""
if action == "send":
to = params.get("to")
subject = params.get("subject")
body = params.get("body")
# Send email
print(f"📧 Email: Sending to {to}")
return {"sent": True}
def database_connector(action: str, **params):
"""Database integration"""
if action == "query":
sql = params.get("sql")
# Execute query
print(f"🗄️ Database: Executing query")
return {"rows": []}
elif action == "insert":
table = params.get("table")
data = params.get("data")
# Insert data
return {"inserted": True}
# Setup
integrations = ToolIntegration()
integrations.register_tool("slack", slack_connector)
integrations.register_tool("email", email_connector)
integrations.register_tool("database", database_connector)
# Use
integrations.execute_tool(
"slack",
"send_message",
channel="#general",
message="Task completed!"
)
Complete Automation Agent
class AutomationAgent:
"""Complete task automation agent"""
def __init__(self):
self.workflow_engine = WorkflowEngine()
self.scheduler = TaskScheduler()
self.event_agent = EventDrivenAgent()
self.integrations = ToolIntegration()
self.client = openai.OpenAI()
def create_automation(self, description: str) -> Dict:
"""Create automation from natural language"""
# Parse description to understand automation
automation_spec = self.parse_automation_description(description)
# Create workflow
workflow_id = self.create_workflow(automation_spec)
# Setup triggers
if automation_spec.get("trigger_type") == "schedule":
self.setup_scheduled_trigger(workflow_id, automation_spec)
elif automation_spec.get("trigger_type") == "event":
self.setup_event_trigger(workflow_id, automation_spec)
return {
"workflow_id": workflow_id,
"automation_spec": automation_spec,
"status": "active"
}
def parse_automation_description(self, description: str) -> Dict:
"""Parse natural language automation description"""
prompt = f"""Parse this automation request into a structured specification:
"{description}"
Provide JSON with:
- trigger_type: "schedule" or "event"
- trigger_spec: schedule time or event type
- steps: list of actions to perform
- integrations: tools needed
Specification:"""
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.2
)
import json
return json.loads(response.choices[0].message.content)
def create_workflow(self, spec: Dict) -> str:
"""Create workflow from specification"""
workflow_id = f"workflow_{int(time.time())}"
for i, step in enumerate(spec.get("steps", [])):
task = Task(
id=f"{workflow_id}_step_{i}",
name=step.get("name"),
action=self.create_action_from_spec(step),
params=step.get("params", {}),
dependencies=step.get("dependencies", [])
)
self.workflow_engine.add_task(task)
return workflow_id
def create_action_from_spec(self, step_spec: Dict) -> Callable:
"""Create executable action from step specification"""
action_type = step_spec.get("action_type")
if action_type == "api_call":
def action(**params):
return self.integrations.execute_tool(
step_spec["tool"],
step_spec["action"],
**params
)
return action
elif action_type == "data_processing":
def action(**params):
# Process data
return {"processed": True}
return action
else:
def action(**params):
print(f"Executing: {step_spec.get('name')}")
return {"done": True}
return action
def setup_scheduled_trigger(self, workflow_id: str, spec: Dict):
"""Setup scheduled trigger for workflow"""
def run_workflow():
print(f"🔄 Running scheduled workflow: {workflow_id}")
self.workflow_engine.execute_workflow()
self.scheduler.schedule_task(
run_workflow,
spec["trigger_spec"]["type"],
spec["trigger_spec"]["time"]
)
def setup_event_trigger(self, workflow_id: str, spec: Dict):
"""Setup event trigger for workflow"""
event_type = EventType[spec["trigger_spec"]["event"]]
def on_event(event: Event):
print(f"🎯 Event triggered workflow: {workflow_id}")
self.workflow_engine.execute_workflow()
self.event_agent.register_handler(event_type, on_event)
# Usage
agent = AutomationAgent()
# Create automation from description
automation = agent.create_automation("""
Every day at 9 AM:
1. Fetch data from the API
2. Process and analyze the data
3. Generate a report
4. Send the report via email to team@company.com
""")
print(f"Created automation: {automation['workflow_id']}")
Best Practices
- Idempotency: Tasks should be safely re-runnable
- Error handling: Always handle failures gracefully
- Logging: Track all automation executions
- Monitoring: Alert on failures
- Testing: Test workflows before production
- Documentation: Document automation logic
- Versioning: Track automation changes
- Rollback: Ability to revert changes
- Rate limiting: Don’t overwhelm systems
- Security: Secure credentials and access
Practice Exercises
Exercise 1: Email Automation Agent (Medium)
Task: Build an agent that processes emails and takes actions.
Click to see solution
class EmailAgent:
def process_email(self, email: Dict) -> Dict:
# Classify email
category = self.classify(email["subject"])
# Route based on category
if category == "urgent":
return self.escalate(email)
elif category == "question":
return self.auto_respond(email)
else:
return self.archive(email)
Exercise 2: Workflow Orchestrator (Hard)
Task: Create an orchestrator that manages complex multi-step workflows.
Click to see solution
class WorkflowOrchestrator:
def execute_workflow(self, workflow: Dict) -> Dict:
results = {}
for step in workflow["steps"]:
if self.check_conditions(step, results):
result = self.execute_step(step)
results[step["id"]] = result
return results
✅ Chapter 6 Summary
You’ve mastered specialized agent types:
- Coding Agents: Analyze, generate, refactor, and test code
- Research Agents: Multi-source search, verification, and synthesis
- Automation Agents: Workflow orchestration, scheduling, and event-driven tasks
These specialized agents demonstrate how to focus agent capabilities on specific domains for maximum effectiveness.
Next Steps
Chapter 6 (Specialized Agent Types) is complete! You now have deep knowledge of coding agents, research agents, and task automation agents. These specialized agents form the foundation for building powerful, domain-specific AI systems.
Agent Learning & Adaptation
Module 7: Learning Objectives
By the end of this module, you will:
- ✓ Implement few-shot and RLHF learning strategies
- ✓ Build multimodal agents processing vision and audio
- ✓ Master LangChain, LangGraph, and other frameworks
- ✓ Design custom agentic frameworks
- ✓ Enable continuous learning and adaptation
Introduction to Agent Learning
Learning and adaptation enable agents to improve over time, personalize to users, and handle new situations without explicit reprogramming.
Why Learning Matters
Benefits:
- Improved performance over time
- Personalization to individual users
- Adaptation to changing environments
- Reduced need for manual updates
- Discovery of better strategies
Challenges:
- Avoiding catastrophic forgetting
- Balancing exploration vs exploitation
- Ensuring safe learning
- Managing computational costs
- Maintaining consistency
Types of Learning
- Few-Shot Learning: Learn from minimal examples
- Reinforcement Learning: Learn from feedback
- Continuous Learning: Ongoing improvement
- Transfer Learning: Apply knowledge to new domains
- Meta-Learning: Learn how to learn
Few-Shot Learning
In-Context Learning
from typing import List, Dict
import openai
class FewShotLearner:
"""Learn from few examples in context"""
def __init__(self):
self.client = openai.OpenAI()
self.examples = []
def add_example(self, input_text: str, output_text: str, explanation: str = None):
"""Add training example"""
example = {
"input": input_text,
"output": output_text,
"explanation": explanation
}
self.examples.append(example)
print(f"✅ Added example: {input_text[:50]}...")
def learn_from_examples(self, examples: List[Dict]):
"""Batch add examples"""
for ex in examples:
self.add_example(ex["input"], ex["output"], ex.get("explanation"))
def predict(self, input_text: str, temperature: float = 0.3) -> str:
"""Make prediction using learned examples"""
# Build prompt with examples
prompt = self.build_few_shot_prompt(input_text)
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=temperature
)
return response.choices[0].message.content
def build_few_shot_prompt(self, input_text: str) -> str:
"""Build prompt with examples"""
prompt = "Learn from these examples:\n\n"
for i, example in enumerate(self.examples, 1):
prompt += f"Example {i}:\n"
prompt += f"Input: {example['input']}\n"
prompt += f"Output: {example['output']}\n"
if example.get('explanation'):
prompt += f"Why: {example['explanation']}\n"
prompt += "\n"
prompt += f"Now apply what you learned:\n"
prompt += f"Input: {input_text}\n"
prompt += f"Output:"
return prompt
def evaluate(self, test_cases: List[Dict]) -> Dict:
"""Evaluate performance on test cases"""
correct = 0
total = len(test_cases)
for test in test_cases:
prediction = self.predict(test["input"])
expected = test["output"]
# Simple exact match (can be more sophisticated)
if prediction.strip().lower() == expected.strip().lower():
correct += 1
accuracy = correct / total if total > 0 else 0
return {
"accuracy": accuracy,
"correct": correct,
"total": total
}
# Usage
learner = FewShotLearner()
# Teach sentiment analysis
learner.add_example(
"This product is amazing!",
"positive",
"Enthusiastic language indicates positive sentiment"
)
learner.add_example(
"Terrible experience, very disappointed",
"negative",
"Words like 'terrible' and 'disappointed' indicate negative sentiment"
)
learner.add_example(
"It's okay, nothing special",
"neutral",
"Lukewarm language indicates neutral sentiment"
)
# Test
result = learner.predict("I love this so much!")
print(f"Prediction: {result}")
# Evaluate
test_cases = [
{"input": "Best purchase ever!", "output": "positive"},
{"input": "Waste of money", "output": "negative"},
{"input": "It works fine", "output": "neutral"}
]
evaluation = learner.evaluate(test_cases)
print(f"Accuracy: {evaluation['accuracy']:.1%}")
Dynamic Example Selection
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
class AdaptiveFewShotLearner(FewShotLearner):
"""Select most relevant examples dynamically"""
def __init__(self, max_examples: int = 5):
super().__init__()
self.max_examples = max_examples
self.example_embeddings = []
def add_example(self, input_text: str, output_text: str, explanation: str = None):
"""Add example with embedding"""
super().add_example(input_text, output_text, explanation)
# Get embedding
embedding = self.get_embedding(input_text)
self.example_embeddings.append(embedding)
def get_embedding(self, text: str) -> np.ndarray:
"""Get text embedding"""
response = self.client.embeddings.create(
model="text-embedding-3-small",
input=text
)
return np.array(response.data[0].embedding)
def select_relevant_examples(self, input_text: str) -> List[Dict]:
"""Select most relevant examples for input"""
if not self.examples:
return []
# Get input embedding
input_embedding = self.get_embedding(input_text)
# Calculate similarities
similarities = []
for i, example_embedding in enumerate(self.example_embeddings):
similarity = cosine_similarity(
input_embedding.reshape(1, -1),
example_embedding.reshape(1, -1)
)[0][0]
similarities.append((i, similarity))
# Sort by similarity
similarities.sort(key=lambda x: x[1], reverse=True)
# Select top examples
selected_indices = [idx for idx, _ in similarities[:self.max_examples]]
selected_examples = [self.examples[i] for i in selected_indices]
return selected_examples
def predict(self, input_text: str, temperature: float = 0.3) -> str:
"""Predict using most relevant examples"""
# Select relevant examples
relevant_examples = self.select_relevant_examples(input_text)
# Temporarily use only relevant examples
original_examples = self.examples
self.examples = relevant_examples
# Make prediction
result = super().predict(input_text, temperature)
# Restore all examples
self.examples = original_examples
return result
# Usage
adaptive_learner = AdaptiveFewShotLearner(max_examples=3)
# Add many examples
examples = [
("Great product!", "positive"),
("Horrible quality", "negative"),
("Works as expected", "neutral"),
("Absolutely love it!", "positive"),
("Complete waste", "negative"),
("It's fine", "neutral"),
]
for inp, out in examples:
adaptive_learner.add_example(inp, out)
# Predict - will use most relevant examples
result = adaptive_learner.predict("This is fantastic!")
print(f"Prediction: {result}")
Reinforcement Learning from Feedback
Human Feedback Collection
from dataclasses import dataclass
from typing import Optional
import time
@dataclass
class Feedback:
"""User feedback on agent response"""
response_id: str
rating: int # 1-5
comment: Optional[str] = None
timestamp: float = None
def __post_init__(self):
if self.timestamp is None:
self.timestamp = time.time()
class FeedbackCollector:
"""Collect and manage user feedback"""
def __init__(self):
self.feedback_history = []
self.response_cache = {}
def record_response(self, response_id: str, prompt: str, response: str):
"""Record agent response"""
self.response_cache[response_id] = {
"prompt": prompt,
"response": response,
"timestamp": time.time()
}
def collect_feedback(self, response_id: str, rating: int, comment: str = None) -> Feedback:
"""Collect feedback on response"""
feedback = Feedback(
response_id=response_id,
rating=rating,
comment=comment
)
self.feedback_history.append(feedback)
print(f"📝 Feedback recorded: {rating}/5")
return feedback
def get_average_rating(self) -> float:
"""Get average rating"""
if not self.feedback_history:
return 0.0
total = sum(f.rating for f in self.feedback_history)
return total / len(self.feedback_history)
def get_positive_examples(self, threshold: int = 4) -> List[Dict]:
"""Get highly-rated examples"""
positive = []
for feedback in self.feedback_history:
if feedback.rating >= threshold:
response_data = self.response_cache.get(feedback.response_id)
if response_data:
positive.append({
"prompt": response_data["prompt"],
"response": response_data["response"],
"rating": feedback.rating
})
return positive
def get_negative_examples(self, threshold: int = 2) -> List[Dict]:
"""Get poorly-rated examples"""
negative = []
for feedback in self.feedback_history:
if feedback.rating <= threshold:
response_data = self.response_cache.get(feedback.response_id)
if response_data:
negative.append({
"prompt": response_data["prompt"],
"response": response_data["response"],
"rating": feedback.rating,
"comment": feedback.comment
})
return negative
# Usage
collector = FeedbackCollector()
# Record response
response_id = "resp_001"
collector.record_response(
response_id,
"What is Python?",
"Python is a programming language..."
)
# Collect feedback
collector.collect_feedback(response_id, 5, "Very helpful!")
# Get positive examples for learning
positive_examples = collector.get_positive_examples()
print(f"Positive examples: {len(positive_examples)}")
Learning from Feedback
class RLFHAgent:
"""Agent that learns from human feedback"""
def __init__(self):
self.client = openai.OpenAI()
self.feedback_collector = FeedbackCollector()
self.learner = AdaptiveFewShotLearner()
def respond(self, prompt: str, response_id: str = None) -> str:
"""Generate response"""
if response_id is None:
response_id = f"resp_{int(time.time())}"
# Use learned examples
positive_examples = self.feedback_collector.get_positive_examples()
# Build prompt with positive examples
enhanced_prompt = self.build_prompt_with_examples(prompt, positive_examples)
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": enhanced_prompt}],
temperature=0.7
)
response_text = response.choices[0].message.content
# Record for feedback
self.feedback_collector.record_response(response_id, prompt, response_text)
return response_text
def build_prompt_with_examples(self, prompt: str, examples: List[Dict]) -> str:
"""Build prompt incorporating learned examples"""
if not examples:
return prompt
enhanced = "Here are examples of good responses:\n\n"
for ex in examples[:5]: # Use top 5
enhanced += f"Q: {ex['prompt']}\n"
enhanced += f"A: {ex['response']}\n\n"
enhanced += f"Now respond to:\nQ: {prompt}\nA:"
return enhanced
def learn_from_feedback(self, response_id: str, rating: int, comment: str = None):
"""Learn from user feedback"""
feedback = self.feedback_collector.collect_feedback(response_id, rating, comment)
# If positive, add to examples
if rating >= 4:
response_data = self.feedback_collector.response_cache.get(response_id)
if response_data:
self.learner.add_example(
response_data["prompt"],
response_data["response"],
f"User rated {rating}/5"
)
print("✅ Learned from positive feedback")
# If negative, analyze and improve
elif rating <= 2:
self.analyze_negative_feedback(response_id, comment)
def analyze_negative_feedback(self, response_id: str, comment: str):
"""Analyze negative feedback to improve"""
response_data = self.feedback_collector.response_cache.get(response_id)
if not response_data:
return
prompt = f"""Analyze this negative feedback:
Original prompt: {response_data['prompt']}
Response: {response_data['response']}
User feedback: {comment}
What went wrong and how to improve?"""
analysis = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}]
)
print(f"📊 Analysis: {analysis.choices[0].message.content[:200]}...")
def get_performance_metrics(self) -> Dict:
"""Get learning performance metrics"""
avg_rating = self.feedback_collector.get_average_rating()
total_feedback = len(self.feedback_collector.feedback_history)
positive_count = len(self.feedback_collector.get_positive_examples())
return {
"average_rating": avg_rating,
"total_feedback": total_feedback,
"positive_examples": positive_count,
"learned_examples": len(self.learner.examples)
}
# Usage
agent = RLFHAgent()
# Interact and learn
response_id = "resp_001"
response = agent.respond("Explain machine learning", response_id)
print(f"Response: {response}")
# User provides feedback
agent.learn_from_feedback(response_id, 5, "Clear and concise!")
# Check improvement
metrics = agent.get_performance_metrics()
print(f"Metrics: {metrics}")
Continuous Learning
Online Learning System
class ContinuousLearner:
"""Agent that continuously learns from interactions"""
def __init__(self, memory_size: int = 1000):
self.client = openai.OpenAI()
self.memory_size = memory_size
self.interaction_history = []
self.performance_history = []
def interact(self, prompt: str) -> Dict:
"""Interact and learn"""
# Generate response
response = self.generate_response(prompt)
# Record interaction
interaction = {
"prompt": prompt,
"response": response,
"timestamp": time.time()
}
self.interaction_history.append(interaction)
# Trim history if too large
if len(self.interaction_history) > self.memory_size:
self.interaction_history = self.interaction_history[-self.memory_size:]
return {
"response": response,
"interaction_id": len(self.interaction_history) - 1
}
def generate_response(self, prompt: str) -> str:
"""Generate response using learned knowledge"""
# Get relevant past interactions
relevant = self.get_relevant_interactions(prompt)
# Build context
context = self.build_context(relevant)
# Generate
messages = [
{"role": "system", "content": context},
{"role": "user", "content": prompt}
]
response = self.client.chat.completions.create(
model="gpt-4",
messages=messages,
temperature=0.7
)
return response.choices[0].message.content
def get_relevant_interactions(self, prompt: str, top_k: int = 5) -> List[Dict]:
"""Get relevant past interactions"""
if not self.interaction_history:
return []
# Simple keyword matching (can use embeddings for better results)
prompt_words = set(prompt.lower().split())
scored = []
for interaction in self.interaction_history:
interaction_words = set(interaction["prompt"].lower().split())
overlap = len(prompt_words & interaction_words)
scored.append((interaction, overlap))
scored.sort(key=lambda x: x[1], reverse=True)
return [interaction for interaction, _ in scored[:top_k]]
def build_context(self, relevant_interactions: List[Dict]) -> str:
"""Build context from relevant interactions"""
if not relevant_interactions:
return "You are a helpful assistant."
context = "You are a helpful assistant. Here are relevant past interactions:\n\n"
for interaction in relevant_interactions:
context += f"Q: {interaction['prompt']}\n"
context += f"A: {interaction['response']}\n\n"
context += "Use this knowledge to inform your response."
return context
def update_from_feedback(self, interaction_id: int, feedback: Dict):
"""Update based on feedback"""
if interaction_id >= len(self.interaction_history):
return
interaction = self.interaction_history[interaction_id]
interaction["feedback"] = feedback
# Track performance
self.performance_history.append({
"timestamp": time.time(),
"rating": feedback.get("rating", 0)
})
def get_learning_curve(self) -> List[float]:
"""Get performance over time"""
if not self.performance_history:
return []
# Calculate moving average
window = 10
curve = []
for i in range(len(self.performance_history)):
start = max(0, i - window + 1)
window_ratings = [
p["rating"] for p in self.performance_history[start:i+1]
]
avg = sum(window_ratings) / len(window_ratings)
curve.append(avg)
return curve
# Usage
learner = ContinuousLearner()
# Continuous interaction
for i in range(10):
result = learner.interact(f"Question {i}: What is AI?")
print(f"Response {i}: {result['response'][:50]}...")
# Simulate feedback
learner.update_from_feedback(result["interaction_id"], {"rating": 4})
# Check learning curve
curve = learner.get_learning_curve()
print(f"Learning curve: {curve}")
Fine-Tuning for Specific Tasks
Preparing Training Data
class FineTuningDataPrep:
"""Prepare data for fine-tuning"""
def __init__(self):
self.training_data = []
def add_training_example(self,
system_message: str,
user_message: str,
assistant_message: str):
"""Add training example"""
example = {
"messages": [
{"role": "system", "content": system_message},
{"role": "user", "content": user_message},
{"role": "assistant", "content": assistant_message}
]
}
self.training_data.append(example)
def load_from_feedback(self, feedback_collector: FeedbackCollector, min_rating: int = 4):
"""Load training data from positive feedback"""
positive_examples = feedback_collector.get_positive_examples(threshold=min_rating)
for example in positive_examples:
self.add_training_example(
"You are a helpful assistant.",
example["prompt"],
example["response"]
)
print(f"Loaded {len(positive_examples)} training examples")
def export_jsonl(self, filename: str):
"""Export to JSONL format for fine-tuning"""
import json
with open(filename, 'w') as f:
for example in self.training_data:
f.write(json.dumps(example) + '\n')
print(f"Exported {len(self.training_data)} examples to {filename}")
def validate_data(self) -> Dict:
"""Validate training data quality"""
if not self.training_data:
return {"valid": False, "error": "No training data"}
issues = []
for i, example in enumerate(self.training_data):
# Check structure
if "messages" not in example:
issues.append(f"Example {i}: Missing 'messages' field")
continue
messages = example["messages"]
# Check message count
if len(messages) < 2:
issues.append(f"Example {i}: Too few messages")
# Check roles
roles = [m["role"] for m in messages]
if "user" not in roles or "assistant" not in roles:
issues.append(f"Example {i}: Missing required roles")
return {
"valid": len(issues) == 0,
"total_examples": len(self.training_data),
"issues": issues
}
# Usage
prep = FineTuningDataPrep()
# Add examples
prep.add_training_example(
"You are a Python expert.",
"How do I sort a list?",
"Use the sorted() function or list.sort() method..."
)
# Validate
validation = prep.validate_data()
print(f"Valid: {validation['valid']}")
# Export
prep.export_jsonl("training_data.jsonl")
Transfer Learning
Domain Adaptation
class DomainAdapter:
"""Adapt agent to new domain"""
def __init__(self, base_agent):
self.base_agent = base_agent
self.domain_examples = []
self.client = openai.OpenAI()
def add_domain_knowledge(self, domain: str, examples: List[Dict]):
"""Add domain-specific examples"""
self.domain_examples.extend(examples)
print(f"Added {len(examples)} examples for domain: {domain}")
def adapt_response(self, prompt: str, domain: str) -> str:
"""Generate domain-adapted response"""
# Get domain examples
domain_context = self.build_domain_context(domain)
# Generate with domain context
messages = [
{"role": "system", "content": domain_context},
{"role": "user", "content": prompt}
]
response = self.client.chat.completions.create(
model="gpt-4",
messages=messages,
temperature=0.5
)
return response.choices[0].message.content
def build_domain_context(self, domain: str) -> str:
"""Build context for specific domain"""
context = f"You are an expert in {domain}.\n\n"
context += "Domain-specific examples:\n\n"
# Filter examples for this domain
relevant = [ex for ex in self.domain_examples if ex.get("domain") == domain]
for ex in relevant[:5]:
context += f"Q: {ex['input']}\n"
context += f"A: {ex['output']}\n\n"
return context
# Usage
adapter = DomainAdapter(base_agent=None)
# Add medical domain knowledge
medical_examples = [
{
"domain": "medical",
"input": "What is hypertension?",
"output": "Hypertension is high blood pressure..."
}
]
adapter.add_domain_knowledge("medical", medical_examples)
# Adapt to medical domain
response = adapter.adapt_response(
"Explain diabetes",
domain="medical"
)
print(response)
Meta-Learning
Learning to Learn
class MetaLearner:
"""Learn how to learn new tasks quickly"""
def __init__(self):
self.client = openai.OpenAI()
self.task_history = []
self.learning_strategies = []
def learn_new_task(self, task_description: str, examples: List[Dict]) -> Dict:
"""Learn a new task"""
print(f"📚 Learning new task: {task_description}")
# Analyze task
task_analysis = self.analyze_task(task_description, examples)
# Select learning strategy
strategy = self.select_strategy(task_analysis)
# Apply strategy
learned_model = self.apply_strategy(strategy, examples)
# Record
self.task_history.append({
"description": task_description,
"analysis": task_analysis,
"strategy": strategy,
"examples_count": len(examples)
})
return {
"task": task_description,
"strategy": strategy,
"model": learned_model
}
def analyze_task(self, description: str, examples: List[Dict]) -> Dict:
"""Analyze task characteristics"""
prompt = f"""Analyze this learning task:
Task: {description}
Examples: {len(examples)}
Sample: {examples[0] if examples else 'None'}
Determine:
1. Task type (classification, generation, etc.)
2. Complexity (simple, medium, complex)
3. Required examples (few, many)
4. Best learning approach
Analysis:"""
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.3
)
# Parse analysis (simplified)
return {
"type": "classification",
"complexity": "medium",
"analysis": response.choices[0].message.content
}
def select_strategy(self, task_analysis: Dict) -> str:
"""Select learning strategy based on task"""
complexity = task_analysis.get("complexity", "medium")
if complexity == "simple":
return "few-shot"
elif complexity == "medium":
return "adaptive-few-shot"
else:
return "fine-tuning"
def apply_strategy(self, strategy: str, examples: List[Dict]) -> Any:
"""Apply selected learning strategy"""
if strategy == "few-shot":
learner = FewShotLearner()
for ex in examples:
learner.add_example(ex["input"], ex["output"])
return learner
elif strategy == "adaptive-few-shot":
learner = AdaptiveFewShotLearner()
for ex in examples:
learner.add_example(ex["input"], ex["output"])
return learner
else:
# Would implement fine-tuning
return None
def get_learning_insights(self) -> Dict:
"""Get insights from learning history"""
if not self.task_history:
return {}
strategies_used = {}
for task in self.task_history:
strategy = task["strategy"]
strategies_used[strategy] = strategies_used.get(strategy, 0) + 1
return {
"total_tasks_learned": len(self.task_history),
"strategies_used": strategies_used,
"avg_examples_per_task": sum(t["examples_count"] for t in self.task_history) / len(self.task_history)
}
# Usage
meta_learner = MetaLearner()
# Learn multiple tasks
tasks = [
{
"description": "Sentiment analysis",
"examples": [
{"input": "Great!", "output": "positive"},
{"input": "Terrible", "output": "negative"}
]
},
{
"description": "Language detection",
"examples": [
{"input": "Hello", "output": "English"},
{"input": "Bonjour", "output": "French"}
]
}
]
for task in tasks:
result = meta_learner.learn_new_task(task["description"], task["examples"])
print(f"Learned using: {result['strategy']}")
# Get insights
insights = meta_learner.get_learning_insights()
print(f"Insights: {insights}")
Best Practices
- Start simple: Begin with few-shot learning
- Collect feedback: Continuously gather user input
- Monitor performance: Track learning metrics
- Avoid overfitting: Don’t memorize, generalize
- Safe learning: Validate before deploying
- Incremental updates: Small, frequent improvements
- A/B testing: Compare learned vs baseline
- Human oversight: Review learned behaviors
- Version control: Track model versions
- Rollback capability: Revert if performance degrades
Next Steps
You now understand agent learning and adaptation in depth! Next, we’ll explore multimodal agents that work with images, audio, and other modalities.
Multimodal Agents
Introduction to Multimodal AI
Multimodal agents can process and generate multiple types of data: text, images, audio, video, and more. This enables richer interactions and broader capabilities.
Why Multimodal Matters
Benefits:
- Richer understanding of context
- More natural interactions
- Broader range of tasks
- Better accessibility
- Cross-modal reasoning
Challenges:
- Increased complexity
- Higher computational costs
- Data alignment across modalities
- Quality control
- Privacy concerns
Modalities
- Vision: Images, videos, screenshots
- Audio: Speech, music, sounds
- Text: Natural language
- Documents: PDFs, spreadsheets
- Structured Data: Tables, graphs
Vision and Image Understanding
Image Analysis
import base64
from pathlib import Path
import openai
class VisionAgent:
"""Agent with vision capabilities"""
def __init__(self):
self.client = openai.OpenAI()
def analyze_image(self, image_path: str, question: str = None) -> str:
"""Analyze image and answer questions"""
# Read and encode image
with open(image_path, "rb") as image_file:
image_data = base64.b64encode(image_file.read()).decode('utf-8')
# Determine image type
ext = Path(image_path).suffix.lower()
mime_type = {
'.jpg': 'image/jpeg',
'.jpeg': 'image/jpeg',
'.png': 'image/png',
'.gif': 'image/gif',
'.webp': 'image/webp'
}.get(ext, 'image/jpeg')
# Build prompt
if question:
prompt = question
else:
prompt = "Describe this image in detail."
# Call vision model
response = self.client.chat.completions.create(
model="gpt-4-vision-preview",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": prompt},
{
"type": "image_url",
"image_url": {
"url": f"data:{mime_type};base64,{image_data}"
}
}
]
}
],
max_tokens=500
)
return response.choices[0].message.content
def extract_text_from_image(self, image_path: str) -> str:
"""Extract text from image (OCR)"""
return self.analyze_image(
image_path,
"Extract all text from this image. Provide the text exactly as it appears."
)
def describe_scene(self, image_path: str) -> Dict:
"""Get detailed scene description"""
description = self.analyze_image(
image_path,
"""Describe this image in detail:
1. Main subjects
2. Setting/location
3. Actions/activities
4. Colors and mood
5. Notable details"""
)
return {"description": description}
def identify_objects(self, image_path: str) -> List[str]:
"""Identify objects in image"""
result = self.analyze_image(
image_path,
"List all objects visible in this image, one per line."
)
# Parse list
objects = [line.strip('- ').strip() for line in result.split('\n') if line.strip()]
return objects
def compare_images(self, image1_path: str, image2_path: str) -> str:
"""Compare two images"""
# Encode both images
images_data = []
for path in [image1_path, image2_path]:
with open(path, "rb") as f:
data = base64.b64encode(f.read()).decode('utf-8')
images_data.append(data)
# Compare
response = self.client.chat.completions.create(
model="gpt-4-vision-preview",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "Compare these two images. What are the similarities and differences?"},
{
"type": "image_url",
"image_url": {"url": f"data:image/jpeg;base64,{images_data[0]}"}
},
{
"type": "image_url",
"image_url": {"url": f"data:image/jpeg;base64,{images_data[1]}"}
}
]
}
],
max_tokens=500
)
return response.choices[0].message.content
def answer_visual_question(self, image_path: str, question: str) -> str:
"""Answer specific question about image"""
return self.analyze_image(image_path, question)
# Usage
vision_agent = VisionAgent()
# Analyze image
description = vision_agent.analyze_image("photo.jpg")
print(f"Description: {description}")
# Extract text (OCR)
text = vision_agent.extract_text_from_image("document.jpg")
print(f"Extracted text: {text}")
# Identify objects
objects = vision_agent.identify_objects("scene.jpg")
print(f"Objects: {objects}")
# Answer question
answer = vision_agent.answer_visual_question(
"chart.jpg",
"What is the trend shown in this chart?"
)
print(f"Answer: {answer}")
Image Generation
class ImageGenerator:
"""Generate images from text"""
def __init__(self):
self.client = openai.OpenAI()
def generate_image(self,
prompt: str,
size: str = "1024x1024",
quality: str = "standard",
n: int = 1) -> List[str]:
"""Generate image from text prompt"""
response = self.client.images.generate(
model="dall-e-3",
prompt=prompt,
size=size,
quality=quality,
n=n
)
# Get URLs
image_urls = [img.url for img in response.data]
return image_urls
def edit_image(self,
image_path: str,
mask_path: str,
prompt: str) -> str:
"""Edit image using mask"""
response = self.client.images.edit(
image=open(image_path, "rb"),
mask=open(mask_path, "rb"),
prompt=prompt,
n=1,
size="1024x1024"
)
return response.data[0].url
def create_variation(self, image_path: str, n: int = 1) -> List[str]:
"""Create variations of image"""
response = self.client.images.create_variation(
image=open(image_path, "rb"),
n=n,
size="1024x1024"
)
return [img.url for img in response.data]
# Usage
generator = ImageGenerator()
# Generate image
urls = generator.generate_image(
"A futuristic AI agent helping humans",
quality="hd"
)
print(f"Generated: {urls[0]}")
# Create variations
variations = generator.create_variation("original.png", n=3)
print(f"Created {len(variations)} variations")
Audio Processing
Speech Recognition
class AudioAgent:
"""Agent with audio capabilities"""
def __init__(self):
self.client = openai.OpenAI()
def transcribe_audio(self, audio_path: str, language: str = None) -> Dict:
"""Transcribe audio to text"""
with open(audio_path, "rb") as audio_file:
transcript = self.client.audio.transcriptions.create(
model="whisper-1",
file=audio_file,
language=language,
response_format="verbose_json"
)
return {
"text": transcript.text,
"language": transcript.language,
"duration": transcript.duration,
"segments": transcript.segments if hasattr(transcript, 'segments') else []
}
def translate_audio(self, audio_path: str) -> str:
"""Translate audio to English"""
with open(audio_path, "rb") as audio_file:
translation = self.client.audio.translations.create(
model="whisper-1",
file=audio_file
)
return translation.text
def transcribe_with_timestamps(self, audio_path: str) -> List[Dict]:
"""Transcribe with word-level timestamps"""
result = self.transcribe_audio(audio_path)
segments = []
for segment in result.get("segments", []):
segments.append({
"start": segment.get("start"),
"end": segment.get("end"),
"text": segment.get("text")
})
return segments
# Usage
audio_agent = AudioAgent()
# Transcribe
result = audio_agent.transcribe_audio("speech.mp3")
print(f"Transcription: {result['text']}")
print(f"Language: {result['language']}")
# Translate
translation = audio_agent.translate_audio("french_audio.mp3")
print(f"Translation: {translation}")
# With timestamps
segments = audio_agent.transcribe_with_timestamps("interview.mp3")
for seg in segments:
print(f"[{seg['start']:.2f}s - {seg['end']:.2f}s]: {seg['text']}")
Text-to-Speech
class TextToSpeech:
"""Convert text to speech"""
def __init__(self):
self.client = openai.OpenAI()
def synthesize_speech(self,
text: str,
voice: str = "alloy",
model: str = "tts-1",
output_path: str = "speech.mp3") -> str:
"""Convert text to speech
Voices: alloy, echo, fable, onyx, nova, shimmer
Models: tts-1 (faster), tts-1-hd (higher quality)
"""
response = self.client.audio.speech.create(
model=model,
voice=voice,
input=text
)
# Save to file
response.stream_to_file(output_path)
return output_path
def synthesize_long_text(self,
text: str,
voice: str = "alloy",
chunk_size: int = 4000) -> List[str]:
"""Synthesize long text in chunks"""
# Split into chunks
chunks = [text[i:i+chunk_size] for i in range(0, len(text), chunk_size)]
output_files = []
for i, chunk in enumerate(chunks):
output_path = f"speech_part_{i}.mp3"
self.synthesize_speech(chunk, voice, output_path=output_path)
output_files.append(output_path)
return output_files
# Usage
tts = TextToSpeech()
# Synthesize
audio_file = tts.synthesize_speech(
"Hello! I am an AI agent with voice capabilities.",
voice="nova"
)
print(f"Generated audio: {audio_file}")
Document Parsing
PDF Processing
import PyPDF2
from typing import List, Dict
class DocumentAgent:
"""Process various document types"""
def __init__(self):
self.client = openai.OpenAI()
self.vision_agent = VisionAgent()
def extract_text_from_pdf(self, pdf_path: str) -> Dict:
"""Extract text from PDF"""
with open(pdf_path, 'rb') as file:
pdf_reader = PyPDF2.PdfReader(file)
text_by_page = []
for page_num, page in enumerate(pdf_reader.pages):
text = page.extract_text()
text_by_page.append({
"page": page_num + 1,
"text": text
})
full_text = "\n\n".join([p["text"] for p in text_by_page])
return {
"num_pages": len(pdf_reader.pages),
"pages": text_by_page,
"full_text": full_text
}
def analyze_pdf_with_vision(self, pdf_path: str) -> List[Dict]:
"""Analyze PDF pages as images"""
# Convert PDF pages to images (requires pdf2image)
from pdf2image import convert_from_path
images = convert_from_path(pdf_path)
analyses = []
for i, image in enumerate(images):
# Save temporarily
temp_path = f"temp_page_{i}.jpg"
image.save(temp_path, 'JPEG')
# Analyze with vision
analysis = self.vision_agent.analyze_image(temp_path)
analyses.append({
"page": i + 1,
"analysis": analysis
})
# Clean up
import os
os.remove(temp_path)
return analyses
def extract_tables_from_pdf(self, pdf_path: str) -> List[Dict]:
"""Extract tables from PDF"""
# Using tabula-py for table extraction
import tabula
tables = tabula.read_pdf(pdf_path, pages='all', multiple_tables=True)
extracted = []
for i, table in enumerate(tables):
extracted.append({
"table_num": i + 1,
"data": table.to_dict('records'),
"shape": table.shape
})
return extracted
def summarize_document(self, text: str, max_length: int = 500) -> str:
"""Summarize document"""
prompt = f"""Summarize this document in {max_length} words or less:
{text[:10000]} # Limit input
Summary:"""
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.3
)
return response.choices[0].message.content
def answer_document_question(self, text: str, question: str) -> str:
"""Answer question about document"""
prompt = f"""Based on this document, answer the question:
Document:
{text[:8000]}
Question: {question}
Answer:"""
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.3
)
return response.choices[0].message.content
# Usage
doc_agent = DocumentAgent()
# Extract text
result = doc_agent.extract_text_from_pdf("document.pdf")
print(f"Pages: {result['num_pages']}")
print(f"First page: {result['pages'][0]['text'][:200]}...")
# Summarize
summary = doc_agent.summarize_document(result['full_text'])
print(f"Summary: {summary}")
# Answer question
answer = doc_agent.answer_document_question(
result['full_text'],
"What are the main conclusions?"
)
print(f"Answer: {answer}")
Cross-Modal Reasoning
Multimodal Understanding
class MultimodalAgent:
"""Agent that reasons across modalities"""
def __init__(self):
self.client = openai.OpenAI()
self.vision = VisionAgent()
self.audio = AudioAgent()
self.document = DocumentAgent()
def analyze_multimodal_input(self, inputs: Dict) -> str:
"""Analyze multiple types of input together"""
context = "Analyzing multimodal input:\n\n"
# Process each modality
if "image" in inputs:
image_analysis = self.vision.analyze_image(inputs["image"])
context += f"Image: {image_analysis}\n\n"
if "audio" in inputs:
audio_transcript = self.audio.transcribe_audio(inputs["audio"])
context += f"Audio: {audio_transcript['text']}\n\n"
if "text" in inputs:
context += f"Text: {inputs['text']}\n\n"
if "document" in inputs:
doc_content = self.document.extract_text_from_pdf(inputs["document"])
context += f"Document: {doc_content['full_text'][:1000]}...\n\n"
# Synthesize understanding
prompt = f"""{context}
Based on all this information, provide a comprehensive analysis:
1. Key themes across all modalities
2. How the different inputs relate to each other
3. Overall insights
Analysis:"""
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.5
)
return response.choices[0].message.content
def generate_multimodal_response(self,
query: str,
include_image: bool = False,
include_audio: bool = False) -> Dict:
"""Generate response in multiple modalities"""
# Generate text response
text_response = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": query}]
).choices[0].message.content
result = {"text": text_response}
# Generate image if requested
if include_image:
# Extract visual description from text
image_prompt = self.extract_visual_description(text_response)
generator = ImageGenerator()
image_url = generator.generate_image(image_prompt)[0]
result["image"] = image_url
# Generate audio if requested
if include_audio:
tts = TextToSpeech()
audio_file = tts.synthesize_speech(text_response)
result["audio"] = audio_file
return result
def extract_visual_description(self, text: str) -> str:
"""Extract visual description for image generation"""
prompt = f"""From this text, create a detailed visual description suitable for image generation:
{text}
Visual description:"""
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.5
)
return response.choices[0].message.content
def create_presentation(self, topic: str, num_slides: int = 5) -> List[Dict]:
"""Create multimodal presentation"""
# Generate outline
outline_prompt = f"Create a {num_slides}-slide presentation outline about: {topic}"
outline_response = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": outline_prompt}]
)
outline = outline_response.choices[0].message.content
# Generate each slide
slides = []
generator = ImageGenerator()
tts = TextToSpeech()
for i in range(num_slides):
# Generate slide content
slide_prompt = f"""Create content for slide {i+1} of presentation about {topic}.
Outline: {outline}
Provide:
1. Title
2. Key points (3-5 bullets)
3. Visual description for image
Slide content:"""
slide_response = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": slide_prompt}]
)
slide_content = slide_response.choices[0].message.content
# Generate image
visual_desc = self.extract_visual_description(slide_content)
image_url = generator.generate_image(visual_desc)[0]
# Generate narration audio
audio_file = tts.synthesize_speech(
slide_content,
output_path=f"slide_{i+1}_narration.mp3"
)
slides.append({
"slide_num": i + 1,
"content": slide_content,
"image": image_url,
"audio": audio_file
})
return slides
# Usage
multimodal_agent = MultimodalAgent()
# Analyze multimodal input
analysis = multimodal_agent.analyze_multimodal_input({
"image": "chart.jpg",
"text": "This shows our quarterly results",
"audio": "explanation.mp3"
})
print(f"Analysis: {analysis}")
# Generate multimodal response
response = multimodal_agent.generate_multimodal_response(
"Explain quantum computing",
include_image=True,
include_audio=True
)
print(f"Text: {response['text']}")
print(f"Image: {response['image']}")
print(f"Audio: {response['audio']}")
# Create presentation
slides = multimodal_agent.create_presentation("AI Agents", num_slides=3)
for slide in slides:
print(f"Slide {slide['slide_num']}: {slide['content'][:100]}...")
Best Practices
- Choose right modality: Use most appropriate for task
- Quality control: Validate outputs across modalities
- Accessibility: Provide alternatives (captions, transcripts)
- Privacy: Handle sensitive data carefully
- Cost management: Multimodal can be expensive
- Caching: Reuse processed results
- Error handling: Each modality can fail differently
- User preferences: Let users choose modalities
- Testing: Test across all modalities
- Performance: Optimize processing pipelines
Next Steps
You now understand multimodal agents in depth! Next, we’ll explore agentic frameworks that help build complex agent systems.
Agentic Frameworks
Introduction to Agent Frameworks
Frameworks provide pre-built components, patterns, and tools for building agents faster and more reliably. They handle common challenges so you can focus on your specific use case.
Why Use Frameworks?
Benefits:
- Faster development
- Battle-tested patterns
- Community support
- Built-in best practices
- Easier maintenance
- Rich ecosystem
Trade-offs:
- Learning curve
- Framework lock-in
- Less control
- Overhead
- Version dependencies
Popular Frameworks
- LangChain: Comprehensive, modular
- LangGraph: State machines for agents
- AutoGPT: Autonomous agents
- CrewAI: Multi-agent collaboration
- AutoGen: Conversational agents
LangChain and LangGraph
LangChain Basics
from langchain.llms import OpenAI
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain.agents import Tool, AgentExecutor, create_react_agent
from langchain.memory import ConversationBufferMemory
class LangChainAgent:
"""Agent built with LangChain"""
def __init__(self):
self.llm = OpenAI(temperature=0.7)
self.memory = ConversationBufferMemory()
self.tools = self._create_tools()
self.agent = self._create_agent()
def _create_tools(self) -> List[Tool]:
"""Create agent tools"""
def search_tool(query: str) -> str:
"""Search for information"""
return f"Search results for: {query}"
def calculator_tool(expression: str) -> str:
"""Calculate mathematical expression"""
try:
return str(eval(expression))
except:
return "Error in calculation"
tools = [
Tool(
name="Search",
func=search_tool,
description="Search for information. Input should be a search query."
),
Tool(
name="Calculator",
func=calculator_tool,
description="Calculate mathematical expressions. Input should be a math expression."
)
]
return tools
def _create_agent(self):
"""Create ReAct agent"""
prompt = PromptTemplate.from_template("""
Answer the following question using available tools.
Tools:
{tools}
Question: {input}
{agent_scratchpad}
""")
agent = create_react_agent(
llm=self.llm,
tools=self.tools,
prompt=prompt
)
agent_executor = AgentExecutor(
agent=agent,
tools=self.tools,
memory=self.memory,
verbose=True,
max_iterations=5
)
return agent_executor
def run(self, query: str) -> str:
"""Run agent"""
result = self.agent.invoke({"input": query})
return result["output"]
# Usage
agent = LangChainAgent()
response = agent.run("What is 25 * 17?")
print(response)
LangChain Chains
from langchain.chains import SequentialChain, TransformChain
from langchain.chains.llm import LLMChain
class ChainedAgent:
"""Agent using LangChain chains"""
def __init__(self):
self.llm = OpenAI(temperature=0.5)
def create_research_chain(self):
"""Create multi-step research chain"""
# Step 1: Generate search queries
query_prompt = PromptTemplate(
input_variables=["topic"],
template="Generate 3 search queries to research: {topic}\n\nQueries:"
)
query_chain = LLMChain(llm=self.llm, prompt=query_prompt, output_key="queries")
# Step 2: Search (simplified)
def search_transform(inputs: dict) -> dict:
queries = inputs["queries"].split('\n')
results = [f"Results for: {q}" for q in queries if q.strip()]
return {"search_results": "\n".join(results)}
search_chain = TransformChain(
input_variables=["queries"],
output_variables=["search_results"],
transform=search_transform
)
# Step 3: Synthesize
synthesis_prompt = PromptTemplate(
input_variables=["topic", "search_results"],
template="""Synthesize information about {topic} from these results:
{search_results}
Summary:"""
)
synthesis_chain = LLMChain(llm=self.llm, prompt=synthesis_prompt, output_key="summary")
# Combine into sequential chain
overall_chain = SequentialChain(
chains=[query_chain, search_chain, synthesis_chain],
input_variables=["topic"],
output_variables=["summary"],
verbose=True
)
return overall_chain
def research(self, topic: str) -> str:
"""Conduct research using chain"""
chain = self.create_research_chain()
result = chain({"topic": topic})
return result["summary"]
# Usage
chained_agent = ChainedAgent()
summary = chained_agent.research("AI agent architectures")
print(summary)
LangGraph State Machines
from langgraph.graph import StateGraph, END
from typing import TypedDict, Annotated
import operator
class AgentState(TypedDict):
"""State for agent"""
messages: Annotated[list, operator.add]
current_step: str
data: dict
class LangGraphAgent:
"""Agent using LangGraph state machine"""
def __init__(self):
self.llm = OpenAI()
self.graph = self._build_graph()
def _build_graph(self):
"""Build state machine graph"""
workflow = StateGraph(AgentState)
# Define nodes (states)
workflow.add_node("start", self.start_node)
workflow.add_node("research", self.research_node)
workflow.add_node("analyze", self.analyze_node)
workflow.add_node("respond", self.respond_node)
# Define edges (transitions)
workflow.set_entry_point("start")
workflow.add_edge("start", "research")
workflow.add_edge("research", "analyze")
workflow.add_edge("analyze", "respond")
workflow.add_edge("respond", END)
return workflow.compile()
def start_node(self, state: AgentState) -> AgentState:
"""Initial state"""
print("📍 Starting...")
state["current_step"] = "start"
return state
def research_node(self, state: AgentState) -> AgentState:
"""Research state"""
print("🔍 Researching...")
# Simulate research
query = state["messages"][-1] if state["messages"] else ""
state["data"]["research_results"] = f"Research results for: {query}"
state["current_step"] = "research"
return state
def analyze_node(self, state: AgentState) -> AgentState:
"""Analysis state"""
print("📊 Analyzing...")
results = state["data"].get("research_results", "")
state["data"]["analysis"] = f"Analysis of: {results}"
state["current_step"] = "analyze"
return state
def respond_node(self, state: AgentState) -> AgentState:
"""Response state"""
print("💬 Responding...")
analysis = state["data"].get("analysis", "")
response = f"Based on analysis: {analysis}"
state["messages"].append(response)
state["current_step"] = "respond"
return state
def run(self, query: str) -> str:
"""Run agent through state machine"""
initial_state = {
"messages": [query],
"current_step": "init",
"data": {}
}
final_state = self.graph.invoke(initial_state)
return final_state["messages"][-1]
# Usage
langgraph_agent = LangGraphAgent()
response = langgraph_agent.run("Explain quantum computing")
print(response)
AutoGPT and BabyAGI
AutoGPT Pattern
class AutoGPTAgent:
"""Autonomous agent inspired by AutoGPT"""
def __init__(self, objective: str):
self.objective = objective
self.client = openai.OpenAI()
self.task_list = []
self.completed_tasks = []
self.memory = []
def run(self, max_iterations: int = 10):
"""Run autonomous agent"""
print(f"🎯 Objective: {self.objective}\n")
# Generate initial tasks
self.task_list = self.generate_tasks(self.objective)
for iteration in range(max_iterations):
if not self.task_list:
print("✅ All tasks completed!")
break
# Get next task
current_task = self.task_list.pop(0)
print(f"\n📋 Task {iteration + 1}: {current_task}")
# Execute task
result = self.execute_task(current_task)
print(f"✓ Result: {result[:200]}...")
# Store in memory
self.memory.append({
"task": current_task,
"result": result
})
self.completed_tasks.append(current_task)
# Generate new tasks based on result
new_tasks = self.generate_new_tasks(current_task, result)
self.task_list.extend(new_tasks)
# Prioritize tasks
self.task_list = self.prioritize_tasks(self.task_list)
return self.summarize_results()
def generate_tasks(self, objective: str) -> List[str]:
"""Generate initial task list"""
prompt = f"""Given this objective: {objective}
Break it down into 3-5 specific, actionable tasks.
List them in order of execution.
Tasks:"""
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.5
)
tasks_text = response.choices[0].message.content
tasks = [t.strip('0123456789.- ').strip() for t in tasks_text.split('\n') if t.strip()]
return tasks
def execute_task(self, task: str) -> str:
"""Execute a single task"""
# Build context from memory
context = self.build_context()
prompt = f"""Objective: {self.objective}
Previous tasks completed:
{context}
Current task: {task}
Execute this task and provide the result:"""
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.7
)
return response.choices[0].message.content
def generate_new_tasks(self, completed_task: str, result: str) -> List[str]:
"""Generate new tasks based on result"""
prompt = f"""Objective: {self.objective}
Completed task: {completed_task}
Result: {result}
Based on this result, what new tasks (if any) should be added?
Only suggest tasks that help achieve the objective.
New tasks (or "none"):"""
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.5
)
tasks_text = response.choices[0].message.content
if "none" in tasks_text.lower():
return []
tasks = [t.strip('0123456789.- ').strip() for t in tasks_text.split('\n') if t.strip()]
return tasks
def prioritize_tasks(self, tasks: List[str]) -> List[str]:
"""Prioritize task list"""
if not tasks:
return []
prompt = f"""Objective: {self.objective}
Tasks to prioritize:
{chr(10).join([f"{i+1}. {t}" for i, t in enumerate(tasks)])}
Reorder these tasks by priority (most important first).
Return just the task list in order.
Prioritized tasks:"""
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.3
)
prioritized_text = response.choices[0].message.content
prioritized = [t.strip('0123456789.- ').strip() for t in prioritized_text.split('\n') if t.strip()]
return prioritized
def build_context(self) -> str:
"""Build context from memory"""
if not self.memory:
return "None"
context = []
for item in self.memory[-5:]: # Last 5 tasks
context.append(f"- {item['task']}: {item['result'][:100]}...")
return "\n".join(context)
def summarize_results(self) -> str:
"""Summarize all results"""
prompt = f"""Objective: {self.objective}
Completed tasks and results:
{chr(10).join([f"{i+1}. {m['task']}: {m['result']}" for i, m in enumerate(self.memory)])}
Provide a comprehensive summary of what was accomplished:"""
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.5
)
return response.choices[0].message.content
# Usage
autogpt = AutoGPTAgent("Research and summarize the top 3 AI agent frameworks")
summary = autogpt.run(max_iterations=5)
print(f"\n📝 Final Summary:\n{summary}")
CrewAI and AutoGen
Multi-Agent Collaboration
class Agent:
"""Individual agent in crew"""
def __init__(self, role: str, goal: str, backstory: str):
self.role = role
self.goal = goal
self.backstory = backstory
self.client = openai.OpenAI()
def execute_task(self, task: str, context: str = "") -> str:
"""Execute task as this agent"""
prompt = f"""You are a {self.role}.
Your goal: {self.goal}
Background: {self.backstory}
{f"Context: {context}" if context else ""}
Task: {task}
Response:"""
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.7
)
return response.choices[0].message.content
class Crew:
"""Crew of collaborating agents"""
def __init__(self):
self.agents = []
self.tasks = []
def add_agent(self, agent: Agent):
"""Add agent to crew"""
self.agents.append(agent)
print(f"👤 Added agent: {agent.role}")
def add_task(self, description: str, agent_role: str, dependencies: List[str] = None):
"""Add task to crew"""
self.tasks.append({
"description": description,
"agent_role": agent_role,
"dependencies": dependencies or [],
"status": "pending",
"result": None
})
def run(self) -> Dict:
"""Execute all tasks with crew"""
print("\n🚀 Starting crew execution\n")
completed = set()
while len(completed) < len(self.tasks):
# Find ready tasks
ready_tasks = [
task for task in self.tasks
if task["status"] == "pending" and
all(dep in completed for dep in task["dependencies"])
]
if not ready_tasks:
break
# Execute ready tasks
for task in ready_tasks:
# Find agent
agent = next((a for a in self.agents if a.role == task["agent_role"]), None)
if not agent:
print(f"⚠️ No agent found for role: {task['agent_role']}")
task["status"] = "failed"
continue
# Build context from dependencies
context = self.build_context(task["dependencies"])
# Execute
print(f"▶️ {agent.role}: {task['description']}")
result = agent.execute_task(task["description"], context)
task["result"] = result
task["status"] = "completed"
completed.add(task["description"])
print(f"✓ Completed\n")
return self.generate_report()
def build_context(self, dependencies: List[str]) -> str:
"""Build context from completed dependencies"""
context_parts = []
for dep in dependencies:
dep_task = next((t for t in self.tasks if t["description"] == dep), None)
if dep_task and dep_task["result"]:
context_parts.append(f"{dep}: {dep_task['result'][:200]}...")
return "\n\n".join(context_parts)
def generate_report(self) -> Dict:
"""Generate execution report"""
completed = sum(1 for t in self.tasks if t["status"] == "completed")
return {
"total_tasks": len(self.tasks),
"completed": completed,
"failed": len(self.tasks) - completed,
"tasks": self.tasks
}
# Usage
crew = Crew()
# Add agents
researcher = Agent(
role="Researcher",
goal="Find and analyze information",
backstory="Expert researcher with deep analytical skills"
)
writer = Agent(
role="Writer",
goal="Create clear, engaging content",
backstory="Professional writer skilled at explaining complex topics"
)
reviewer = Agent(
role="Reviewer",
goal="Ensure quality and accuracy",
backstory="Detail-oriented reviewer with high standards"
)
crew.add_agent(researcher)
crew.add_agent(writer)
crew.add_agent(reviewer)
# Add tasks
crew.add_task(
"Research the top 3 AI agent frameworks",
"Researcher"
)
crew.add_task(
"Write a comparison article based on the research",
"Writer",
dependencies=["Research the top 3 AI agent frameworks"]
)
crew.add_task(
"Review the article for accuracy and clarity",
"Reviewer",
dependencies=["Write a comparison article based on the research"]
)
# Execute
report = crew.run()
print(f"\n📊 Report: {report['completed']}/{report['total_tasks']} tasks completed")
Custom Framework Design
Building Your Own Framework
class CustomAgentFramework:
"""Custom agent framework"""
def __init__(self):
self.agents = {}
self.tools = {}
self.memory = {}
self.middleware = []
def register_agent(self, name: str, agent_class):
"""Register agent type"""
self.agents[name] = agent_class
print(f"✅ Registered agent: {name}")
def register_tool(self, name: str, tool_func):
"""Register tool"""
self.tools[name] = tool_func
print(f"🔧 Registered tool: {name}")
def add_middleware(self, middleware_func):
"""Add middleware for request processing"""
self.middleware.append(middleware_func)
def create_agent(self, agent_type: str, **kwargs):
"""Create agent instance"""
if agent_type not in self.agents:
raise ValueError(f"Unknown agent type: {agent_type}")
agent_class = self.agents[agent_type]
agent = agent_class(framework=self, **kwargs)
return agent
def execute_tool(self, tool_name: str, **params):
"""Execute tool"""
if tool_name not in self.tools:
raise ValueError(f"Unknown tool: {tool_name}")
return self.tools[tool_name](**params)
def process_request(self, agent, request: str) -> str:
"""Process request through middleware"""
# Apply middleware
for middleware in self.middleware:
request = middleware(request)
# Execute agent
response = agent.process(request)
return response
# Usage
framework = CustomAgentFramework()
# Register components
framework.register_tool("search", lambda query: f"Results for: {query}")
framework.register_tool("calculate", lambda expr: str(eval(expr)))
# Add middleware
def logging_middleware(request):
print(f"📝 Request: {request}")
return request
framework.add_middleware(logging_middleware)
# Create and use agent
# agent = framework.create_agent("research_agent")
# response = framework.process_request(agent, "Find information about AI")
Best Practices
- Choose right framework: Match to your needs
- Start simple: Don’t over-engineer
- Understand abstractions: Know what framework does
- Customize carefully: Extend, don’t fight framework
- Keep updated: Follow framework updates
- Test thoroughly: Framework bugs affect you
- Monitor performance: Track overhead
- Document usage: Help team understand
- Plan migration: Have exit strategy
- Contribute back: Share improvements
Next Steps
Chapter 7 (Advanced Topics) is complete! You now have deep knowledge of agent learning, multimodal capabilities, and frameworks. This prepares you for enterprise-scale deployments in Module 8.
Architecture Patterns
Module 8: Learning Objectives
By the end of this module, you will:
- ✓ Design microservices and event-driven architectures
- ✓ Implement enterprise security and compliance
- ✓ Optimize costs through caching and model selection
- ✓ Scale agents to handle production workloads
- ✓ Deploy on Kubernetes and serverless platforms
Introduction to Enterprise Architecture
Enterprise-scale agent systems require robust, scalable, and maintainable architectures. This section covers proven patterns for production deployments.
Key Requirements
Scalability:
- Handle increasing load
- Horizontal scaling
- Resource efficiency
- Performance optimization
Reliability:
- High availability (99.9%+)
- Fault tolerance
- Graceful degradation
- Disaster recovery
Maintainability:
- Clear separation of concerns
- Easy updates and rollbacks
- Monitoring and debugging
- Documentation
Security:
- Authentication and authorization
- Data encryption
- Audit logging
- Compliance
Microservices for Agents
Agent Microservices Architecture
from fastapi import FastAPI, HTTPException, Depends
from pydantic import BaseModel
from typing import Optional, Dict, Any
import uvicorn
# Agent Service
class AgentService:
"""Core agent microservice"""
def __init__(self):
self.app = FastAPI(title="Agent Service")
self.setup_routes()
def setup_routes(self):
"""Setup API routes"""
@self.app.post("/agent/process")
async def process_request(request: AgentRequest):
"""Process agent request"""
try:
result = await self.process(request)
return {"success": True, "result": result}
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
@self.app.get("/agent/health")
async def health_check():
"""Health check endpoint"""
return {"status": "healthy", "service": "agent"}
async def process(self, request: AgentRequest) -> Dict:
"""Process agent request"""
# Agent logic here
return {"response": "Processed"}
def run(self, host: str = "0.0.0.0", port: int = 8000):
"""Run service"""
uvicorn.run(self.app, host=host, port=port)
class AgentRequest(BaseModel):
"""Agent request model"""
user_id: str
input: str
context: Optional[Dict[str, Any]] = None
# Tool Service
class ToolService:
"""Tool execution microservice"""
def __init__(self):
self.app = FastAPI(title="Tool Service")
self.tools = {}
self.setup_routes()
def setup_routes(self):
"""Setup API routes"""
@self.app.post("/tools/execute")
async def execute_tool(request: ToolRequest):
"""Execute tool"""
try:
result = await self.execute(request)
return {"success": True, "result": result}
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
@self.app.get("/tools/list")
async def list_tools():
"""List available tools"""
return {"tools": list(self.tools.keys())}
async def execute(self, request: ToolRequest) -> Any:
"""Execute tool"""
if request.tool_name not in self.tools:
raise ValueError(f"Unknown tool: {request.tool_name}")
tool = self.tools[request.tool_name]
return tool(**request.parameters)
def register_tool(self, name: str, func):
"""Register tool"""
self.tools[name] = func
class ToolRequest(BaseModel):
"""Tool request model"""
tool_name: str
parameters: Dict[str, Any]
# Memory Service
class MemoryService:
"""Memory management microservice"""
def __init__(self):
self.app = FastAPI(title="Memory Service")
self.storage = {}
self.setup_routes()
def setup_routes(self):
"""Setup API routes"""
@self.app.post("/memory/store")
async def store_memory(request: MemoryRequest):
"""Store memory"""
self.storage[request.key] = request.value
return {"success": True}
@self.app.get("/memory/retrieve/{key}")
async def retrieve_memory(key: str):
"""Retrieve memory"""
value = self.storage.get(key)
if value is None:
raise HTTPException(status_code=404, detail="Memory not found")
return {"key": key, "value": value}
@self.app.delete("/memory/delete/{key}")
async def delete_memory(key: str):
"""Delete memory"""
if key in self.storage:
del self.storage[key]
return {"success": True}
class MemoryRequest(BaseModel):
"""Memory request model"""
key: str
value: Any
# API Gateway
class APIGateway:
"""API Gateway for routing requests"""
def __init__(self):
self.app = FastAPI(title="API Gateway")
self.services = {
"agent": "http://localhost:8000",
"tools": "http://localhost:8001",
"memory": "http://localhost:8002"
}
self.setup_routes()
def setup_routes(self):
"""Setup gateway routes"""
@self.app.post("/api/chat")
async def chat(request: ChatRequest):
"""Chat endpoint"""
import httpx
# Route to agent service
async with httpx.AsyncClient() as client:
response = await client.post(
f"{self.services['agent']}/agent/process",
json=request.dict()
)
return response.json()
@self.app.get("/api/health")
async def health():
"""Check health of all services"""
import httpx
health_status = {}
async with httpx.AsyncClient() as client:
for service, url in self.services.items():
try:
response = await client.get(f"{url}/health", timeout=5)
health_status[service] = "healthy"
except:
health_status[service] = "unhealthy"
return {"services": health_status}
class ChatRequest(BaseModel):
"""Chat request model"""
user_id: str
message: str
# Usage
if __name__ == "__main__":
# Start services on different ports
agent_service = AgentService()
# agent_service.run(port=8000)
tool_service = ToolService()
# tool_service.run(port=8001)
memory_service = MemoryService()
# memory_service.run(port=8002)
gateway = APIGateway()
# gateway.app.run(port=8080)
Service Communication
import httpx
from typing import Optional
import asyncio
class ServiceClient:
"""Client for inter-service communication"""
def __init__(self, base_url: str, timeout: int = 30):
self.base_url = base_url
self.timeout = timeout
self.client = httpx.AsyncClient(timeout=timeout)
async def call_service(self,
endpoint: str,
method: str = "POST",
data: Optional[Dict] = None) -> Dict:
"""Call another service"""
url = f"{self.base_url}{endpoint}"
try:
if method == "POST":
response = await self.client.post(url, json=data)
elif method == "GET":
response = await self.client.get(url)
else:
raise ValueError(f"Unsupported method: {method}")
response.raise_for_status()
return response.json()
except httpx.HTTPError as e:
return {"error": str(e)}
async def close(self):
"""Close client"""
await self.client.aclose()
# Circuit Breaker for service calls
class CircuitBreaker:
"""Circuit breaker for service resilience"""
def __init__(self, failure_threshold: int = 5, timeout: int = 60):
self.failure_threshold = failure_threshold
self.timeout = timeout
self.failures = 0
self.last_failure_time = None
self.state = "closed" # closed, open, half-open
async def call(self, func, *args, **kwargs):
"""Call function with circuit breaker"""
if self.state == "open":
if time.time() - self.last_failure_time > self.timeout:
self.state = "half-open"
else:
raise Exception("Circuit breaker is OPEN")
try:
result = await func(*args, **kwargs)
if self.state == "half-open":
self.state = "closed"
self.failures = 0
return result
except Exception as e:
self.failures += 1
self.last_failure_time = time.time()
if self.failures >= self.failure_threshold:
self.state = "open"
raise e
# Service Registry
class ServiceRegistry:
"""Service discovery and registration"""
def __init__(self):
self.services = {}
def register(self, service_name: str, url: str, metadata: Dict = None):
"""Register service"""
self.services[service_name] = {
"url": url,
"metadata": metadata or {},
"registered_at": time.time()
}
print(f"✅ Registered service: {service_name} at {url}")
def discover(self, service_name: str) -> Optional[str]:
"""Discover service URL"""
service = self.services.get(service_name)
return service["url"] if service else None
def list_services(self) -> Dict:
"""List all services"""
return self.services
# Usage
registry = ServiceRegistry()
registry.register("agent-service", "http://localhost:8000")
registry.register("tool-service", "http://localhost:8001")
# Get service URL
agent_url = registry.discover("agent-service")
Event-Driven Architectures
Message Queue Integration
import json
from typing import Callable, Dict
import asyncio
from queue import Queue
import threading
class MessageBroker:
"""Simple message broker"""
def __init__(self):
self.queues = {}
self.subscribers = {}
def create_queue(self, queue_name: str):
"""Create message queue"""
if queue_name not in self.queues:
self.queues[queue_name] = Queue()
self.subscribers[queue_name] = []
def publish(self, queue_name: str, message: Dict):
"""Publish message to queue"""
if queue_name not in self.queues:
self.create_queue(queue_name)
self.queues[queue_name].put(message)
print(f"📤 Published to {queue_name}: {message}")
def subscribe(self, queue_name: str, handler: Callable):
"""Subscribe to queue"""
if queue_name not in self.queues:
self.create_queue(queue_name)
self.subscribers[queue_name].append(handler)
print(f"📥 Subscribed to {queue_name}")
def start_consumer(self, queue_name: str):
"""Start consuming messages"""
def consume():
while True:
try:
message = self.queues[queue_name].get(timeout=1)
# Call all subscribers
for handler in self.subscribers[queue_name]:
try:
handler(message)
except Exception as e:
print(f"❌ Handler error: {e}")
except:
continue
thread = threading.Thread(target=consume, daemon=True)
thread.start()
# Event-Driven Agent
class EventDrivenAgent:
"""Agent using event-driven architecture"""
def __init__(self, broker: MessageBroker):
self.broker = broker
self.setup_subscriptions()
def setup_subscriptions(self):
"""Setup event subscriptions"""
self.broker.subscribe("user_request", self.handle_user_request)
self.broker.subscribe("tool_result", self.handle_tool_result)
def handle_user_request(self, message: Dict):
"""Handle user request event"""
print(f"🤖 Processing request: {message}")
# Process and publish result
result = {"response": f"Processed: {message.get('input')}"}
self.broker.publish("agent_response", result)
def handle_tool_result(self, message: Dict):
"""Handle tool result event"""
print(f"🔧 Tool result: {message}")
# Usage
broker = MessageBroker()
agent = EventDrivenAgent(broker)
# Start consumers
broker.start_consumer("user_request")
broker.start_consumer("tool_result")
# Publish event
broker.publish("user_request", {"user_id": "123", "input": "Hello"})
Kafka Integration
from kafka import KafkaProducer, KafkaConsumer
import json
class KafkaAgentSystem:
"""Agent system using Kafka"""
def __init__(self, bootstrap_servers: str = "localhost:9092"):
self.bootstrap_servers = bootstrap_servers
self.producer = KafkaProducer(
bootstrap_servers=bootstrap_servers,
value_serializer=lambda v: json.dumps(v).encode('utf-8')
)
def publish_event(self, topic: str, event: Dict):
"""Publish event to Kafka"""
self.producer.send(topic, event)
self.producer.flush()
print(f"📤 Published to {topic}")
def create_consumer(self, topic: str, group_id: str):
"""Create Kafka consumer"""
consumer = KafkaConsumer(
topic,
bootstrap_servers=self.bootstrap_servers,
group_id=group_id,
value_deserializer=lambda m: json.loads(m.decode('utf-8'))
)
return consumer
def consume_events(self, topic: str, group_id: str, handler: Callable):
"""Consume events from Kafka"""
consumer = self.create_consumer(topic, group_id)
for message in consumer:
try:
handler(message.value)
except Exception as e:
print(f"❌ Error processing message: {e}")
# Usage
# kafka_system = KafkaAgentSystem()
# kafka_system.publish_event("agent-requests", {"user_id": "123", "input": "Hello"})
Serverless Deployments
AWS Lambda Agent
import json
import boto3
from typing import Dict, Any
class LambdaAgent:
"""Agent deployed as AWS Lambda"""
def __init__(self):
self.client = openai.OpenAI()
self.dynamodb = boto3.resource('dynamodb')
self.table = self.dynamodb.Table('agent-memory')
def handler(self, event: Dict, context: Any) -> Dict:
"""Lambda handler function"""
try:
# Parse request
body = json.loads(event.get('body', '{}'))
user_id = body.get('user_id')
input_text = body.get('input')
# Get user memory
memory = self.get_memory(user_id)
# Process request
response = self.process(input_text, memory)
# Update memory
self.update_memory(user_id, response)
return {
'statusCode': 200,
'body': json.dumps({
'response': response
})
}
except Exception as e:
return {
'statusCode': 500,
'body': json.dumps({
'error': str(e)
})
}
def process(self, input_text: str, memory: Dict) -> str:
"""Process request"""
# Build context from memory
context = memory.get('context', '')
messages = [
{"role": "system", "content": f"Context: {context}"},
{"role": "user", "content": input_text}
]
response = self.client.chat.completions.create(
model="gpt-4",
messages=messages
)
return response.choices[0].message.content
def get_memory(self, user_id: str) -> Dict:
"""Get user memory from DynamoDB"""
try:
response = self.table.get_item(Key={'user_id': user_id})
return response.get('Item', {})
except:
return {}
def update_memory(self, user_id: str, response: str):
"""Update user memory"""
try:
self.table.put_item(
Item={
'user_id': user_id,
'context': response,
'updated_at': int(time.time())
}
)
except Exception as e:
print(f"Error updating memory: {e}")
# Lambda function
def lambda_handler(event, context):
"""AWS Lambda entry point"""
agent = LambdaAgent()
return agent.handler(event, context)
Serverless Framework Configuration
# serverless.yml
service: agent-service
provider:
name: aws
runtime: python3.11
region: us-east-1
environment:
OPENAI_API_KEY: ${env:OPENAI_API_KEY}
iamRoleStatements:
- Effect: Allow
Action:
- dynamodb:GetItem
- dynamodb:PutItem
Resource: "arn:aws:dynamodb:*:*:table/agent-memory"
functions:
agent:
handler: handler.lambda_handler
events:
- http:
path: agent/process
method: post
cors: true
timeout: 30
memorySize: 512
resources:
Resources:
AgentMemoryTable:
Type: AWS::DynamoDB::Table
Properties:
TableName: agent-memory
AttributeDefinitions:
- AttributeName: user_id
AttributeType: S
KeySchema:
- AttributeName: user_id
KeyType: HASH
BillingMode: PAY_PER_REQUEST
Scaling Strategies
Horizontal Scaling
from multiprocessing import Pool, cpu_count
import concurrent.futures
class ScalableAgentPool:
"""Pool of agent workers for horizontal scaling"""
def __init__(self, num_workers: int = None):
self.num_workers = num_workers or cpu_count()
self.pool = Pool(processes=self.num_workers)
print(f"🔧 Created pool with {self.num_workers} workers")
def process_batch(self, requests: List[Dict]) -> List[Dict]:
"""Process batch of requests in parallel"""
results = self.pool.map(self.process_single, requests)
return results
def process_single(self, request: Dict) -> Dict:
"""Process single request"""
# Agent processing logic
return {"response": f"Processed: {request.get('input')}"}
def close(self):
"""Close pool"""
self.pool.close()
self.pool.join()
# Async scaling
class AsyncAgentPool:
"""Async agent pool"""
def __init__(self, max_workers: int = 10):
self.executor = concurrent.futures.ThreadPoolExecutor(max_workers=max_workers)
async def process_batch(self, requests: List[Dict]) -> List[Dict]:
"""Process batch asynchronously"""
loop = asyncio.get_event_loop()
tasks = [
loop.run_in_executor(self.executor, self.process_single, req)
for req in requests
]
results = await asyncio.gather(*tasks)
return results
def process_single(self, request: Dict) -> Dict:
"""Process single request"""
return {"response": f"Processed: {request.get('input')}"}
# Usage
pool = ScalableAgentPool(num_workers=4)
requests = [
{"input": f"Request {i}"} for i in range(100)
]
results = pool.process_batch(requests)
print(f"Processed {len(results)} requests")
pool.close()
Load Balancing
from typing import List
import random
class LoadBalancer:
"""Load balancer for agent instances"""
def __init__(self, strategy: str = "round_robin"):
self.strategy = strategy
self.instances = []
self.current_index = 0
self.instance_loads = {}
def register_instance(self, instance_url: str):
"""Register agent instance"""
self.instances.append(instance_url)
self.instance_loads[instance_url] = 0
print(f"✅ Registered instance: {instance_url}")
def get_instance(self) -> str:
"""Get instance based on strategy"""
if self.strategy == "round_robin":
return self.round_robin()
elif self.strategy == "least_connections":
return self.least_connections()
elif self.strategy == "random":
return self.random_selection()
else:
return self.round_robin()
def round_robin(self) -> str:
"""Round-robin selection"""
if not self.instances:
raise Exception("No instances available")
instance = self.instances[self.current_index]
self.current_index = (self.current_index + 1) % len(self.instances)
return instance
def least_connections(self) -> str:
"""Select instance with least connections"""
if not self.instances:
raise Exception("No instances available")
return min(self.instance_loads, key=self.instance_loads.get)
def random_selection(self) -> str:
"""Random selection"""
if not self.instances:
raise Exception("No instances available")
return random.choice(self.instances)
def record_request(self, instance_url: str):
"""Record request to instance"""
self.instance_loads[instance_url] += 1
def record_completion(self, instance_url: str):
"""Record request completion"""
self.instance_loads[instance_url] -= 1
# Usage
lb = LoadBalancer(strategy="least_connections")
lb.register_instance("http://agent1:8000")
lb.register_instance("http://agent2:8000")
lb.register_instance("http://agent3:8000")
# Route request
instance = lb.get_instance()
print(f"Routing to: {instance}")
Container Orchestration
Docker Compose Setup
# docker-compose.yml
version: '3.8'
services:
agent-service:
build: ./agent-service
ports:
- "8000:8000"
environment:
- OPENAI_API_KEY=${OPENAI_API_KEY}
- REDIS_URL=redis://redis:6379
depends_on:
- redis
- postgres
deploy:
replicas: 3
resources:
limits:
cpus: '1'
memory: 1G
tool-service:
build: ./tool-service
ports:
- "8001:8001"
environment:
- REDIS_URL=redis://redis:6379
depends_on:
- redis
memory-service:
build: ./memory-service
ports:
- "8002:8002"
environment:
- POSTGRES_URL=postgresql://user:pass@postgres:5432/agentdb
depends_on:
- postgres
redis:
image: redis:7-alpine
ports:
- "6379:6379"
volumes:
- redis-data:/data
postgres:
image: postgres:15-alpine
environment:
- POSTGRES_USER=user
- POSTGRES_PASSWORD=pass
- POSTGRES_DB=agentdb
ports:
- "5432:5432"
volumes:
- postgres-data:/var/lib/postgresql/data
nginx:
image: nginx:alpine
ports:
- "80:80"
volumes:
- ./nginx.conf:/etc/nginx/nginx.conf
depends_on:
- agent-service
volumes:
redis-data:
postgres-data:
Kubernetes Deployment
# kubernetes/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: agent-service
spec:
replicas: 3
selector:
matchLabels:
app: agent-service
template:
metadata:
labels:
app: agent-service
spec:
containers:
- name: agent
image: agent-service:latest
ports:
- containerPort: 8000
env:
- name: OPENAI_API_KEY
valueFrom:
secretKeyRef:
name: agent-secrets
key: openai-api-key
resources:
requests:
memory: "512Mi"
cpu: "500m"
limits:
memory: "1Gi"
cpu: "1000m"
livenessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 8000
initialDelaySeconds: 5
periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
name: agent-service
spec:
selector:
app: agent-service
ports:
- protocol: TCP
port: 80
targetPort: 8000
type: LoadBalancer
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: agent-service-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: agent-service
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
Best Practices
- Decouple services: Loose coupling, high cohesion
- Stateless design: Store state externally
- Idempotent operations: Safe to retry
- Circuit breakers: Prevent cascading failures
- Health checks: Monitor service health
- Graceful shutdown: Clean resource cleanup
- Configuration management: Externalize config
- Service discovery: Dynamic service location
- API versioning: Backward compatibility
- Documentation: Clear API contracts
Next Steps
You now understand enterprise architecture patterns! Next, we’ll explore security and compliance for production agent systems.
Security & Compliance
Introduction to Agent Security
Security is critical for production agent systems. This section covers authentication, authorization, data protection, and compliance requirements.
Security Principles
Defense in Depth: Multiple layers of security Least Privilege: Minimum necessary access Zero Trust: Verify everything Encryption: Protect data at rest and in transit Audit Everything: Complete logging
Threat Model
Threats:
- Unauthorized access
- Data breaches
- Prompt injection
- Model manipulation
- Resource exhaustion
- Privacy violations
Authentication and Authorization
JWT-Based Authentication
import jwt
from datetime import datetime, timedelta
from fastapi import HTTPException, Security, Depends
from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials
from typing import Optional, Dict
class AuthManager:
"""JWT-based authentication"""
def __init__(self, secret_key: str, algorithm: str = "HS256"):
self.secret_key = secret_key
self.algorithm = algorithm
self.security = HTTPBearer()
def create_token(self,
user_id: str,
roles: List[str],
expires_in: int = 3600) -> str:
"""Create JWT token"""
payload = {
"user_id": user_id,
"roles": roles,
"exp": datetime.utcnow() + timedelta(seconds=expires_in),
"iat": datetime.utcnow()
}
token = jwt.encode(payload, self.secret_key, algorithm=self.algorithm)
return token
def verify_token(self, token: str) -> Dict:
"""Verify and decode JWT token"""
try:
payload = jwt.decode(
token,
self.secret_key,
algorithms=[self.algorithm]
)
return payload
except jwt.ExpiredSignatureError:
raise HTTPException(status_code=401, detail="Token expired")
except jwt.InvalidTokenError:
raise HTTPException(status_code=401, detail="Invalid token")
async def get_current_user(self,
credentials: HTTPAuthorizationCredentials = Security(HTTPBearer())):
"""Get current user from token"""
token = credentials.credentials
payload = self.verify_token(token)
return {
"user_id": payload["user_id"],
"roles": payload["roles"]
}
# Role-Based Access Control
class RBACManager:
"""Role-based access control"""
def __init__(self):
self.permissions = {
"admin": ["read", "write", "delete", "admin"],
"user": ["read", "write"],
"viewer": ["read"]
}
def has_permission(self, roles: List[str], required_permission: str) -> bool:
"""Check if roles have required permission"""
for role in roles:
if role in self.permissions:
if required_permission in self.permissions[role]:
return True
return False
def require_permission(self, permission: str):
"""Decorator to require permission"""
def decorator(func):
async def wrapper(*args, **kwargs):
# Get user from context
user = kwargs.get('current_user')
if not user:
raise HTTPException(status_code=401, detail="Not authenticated")
if not self.has_permission(user['roles'], permission):
raise HTTPException(status_code=403, detail="Insufficient permissions")
return await func(*args, **kwargs)
return wrapper
return decorator
# Secure Agent API
class SecureAgentAPI:
"""Agent API with authentication"""
def __init__(self):
self.app = FastAPI()
self.auth = AuthManager(secret_key="your-secret-key")
self.rbac = RBACManager()
self.setup_routes()
def setup_routes(self):
"""Setup secure routes"""
@self.app.post("/auth/login")
async def login(credentials: LoginRequest):
"""Login and get token"""
# Verify credentials (simplified)
if self.verify_credentials(credentials.username, credentials.password):
token = self.auth.create_token(
user_id=credentials.username,
roles=["user"]
)
return {"token": token}
else:
raise HTTPException(status_code=401, detail="Invalid credentials")
@self.app.post("/agent/process")
async def process(
request: AgentRequest,
current_user: Dict = Depends(self.auth.get_current_user)
):
"""Process request (requires authentication)"""
# Check permission
if not self.rbac.has_permission(current_user['roles'], 'write'):
raise HTTPException(status_code=403, detail="Insufficient permissions")
# Process request
result = await self.process_request(request, current_user)
return {"result": result}
def verify_credentials(self, username: str, password: str) -> bool:
"""Verify user credentials"""
# In production, check against database with hashed passwords
return True
class LoginRequest(BaseModel):
username: str
password: str
# Usage
api = SecureAgentAPI()
API Key Management
import secrets
import hashlib
from datetime import datetime
class APIKeyManager:
"""Manage API keys"""
def __init__(self):
self.keys = {} # In production, use database
def generate_key(self, user_id: str, name: str) -> str:
"""Generate new API key"""
# Generate secure random key
key = f"sk_{secrets.token_urlsafe(32)}"
# Hash for storage
key_hash = hashlib.sha256(key.encode()).hexdigest()
# Store
self.keys[key_hash] = {
"user_id": user_id,
"name": name,
"created_at": datetime.utcnow(),
"last_used": None,
"usage_count": 0
}
return key
def verify_key(self, key: str) -> Optional[Dict]:
"""Verify API key"""
key_hash = hashlib.sha256(key.encode()).hexdigest()
if key_hash in self.keys:
# Update usage
self.keys[key_hash]["last_used"] = datetime.utcnow()
self.keys[key_hash]["usage_count"] += 1
return self.keys[key_hash]
return None
def revoke_key(self, key: str):
"""Revoke API key"""
key_hash = hashlib.sha256(key.encode()).hexdigest()
if key_hash in self.keys:
del self.keys[key_hash]
return True
return False
# API Key Authentication
from fastapi.security import APIKeyHeader
class APIKeyAuth:
"""API Key authentication"""
def __init__(self, key_manager: APIKeyManager):
self.key_manager = key_manager
self.api_key_header = APIKeyHeader(name="X-API-Key")
async def verify(self, api_key: str = Security(APIKeyHeader(name="X-API-Key"))):
"""Verify API key"""
key_data = self.key_manager.verify_key(api_key)
if not key_data:
raise HTTPException(status_code=401, detail="Invalid API key")
return key_data
# Usage
key_manager = APIKeyManager()
api_key = key_manager.generate_key("user123", "Production Key")
print(f"API Key: {api_key}")
Data Encryption
Encryption at Rest
from cryptography.fernet import Fernet
from cryptography.hazmat.primitives import hashes
from cryptography.hazmat.primitives.kdf.pbkdf2 import PBKDF2
import base64
class DataEncryption:
"""Encrypt sensitive data"""
def __init__(self, password: str):
self.key = self.derive_key(password)
self.cipher = Fernet(self.key)
def derive_key(self, password: str) -> bytes:
"""Derive encryption key from password"""
kdf = PBKDF2(
algorithm=hashes.SHA256(),
length=32,
salt=b'static_salt', # In production, use random salt
iterations=100000,
)
key = base64.urlsafe_b64encode(kdf.derive(password.encode()))
return key
def encrypt(self, data: str) -> str:
"""Encrypt data"""
encrypted = self.cipher.encrypt(data.encode())
return base64.urlsafe_b64encode(encrypted).decode()
def decrypt(self, encrypted_data: str) -> str:
"""Decrypt data"""
encrypted = base64.urlsafe_b64decode(encrypted_data.encode())
decrypted = self.cipher.decrypt(encrypted)
return decrypted.decode()
# Encrypted Storage
class EncryptedStorage:
"""Store data with encryption"""
def __init__(self, encryption_key: str):
self.encryption = DataEncryption(encryption_key)
self.storage = {}
def store(self, key: str, value: str):
"""Store encrypted data"""
encrypted_value = self.encryption.encrypt(value)
self.storage[key] = encrypted_value
def retrieve(self, key: str) -> Optional[str]:
"""Retrieve and decrypt data"""
encrypted_value = self.storage.get(key)
if encrypted_value:
return self.encryption.decrypt(encrypted_value)
return None
# Usage
storage = EncryptedStorage("my-secret-password")
storage.store("api_key", "sk_1234567890")
retrieved = storage.retrieve("api_key")
print(f"Retrieved: {retrieved}")
Encryption in Transit (TLS/SSL)
import ssl
from fastapi import FastAPI
import uvicorn
class SecureServer:
"""HTTPS server with TLS"""
def __init__(self):
self.app = FastAPI()
self.setup_routes()
def setup_routes(self):
"""Setup routes"""
@self.app.get("/")
async def root():
return {"message": "Secure server"}
def run(self,
host: str = "0.0.0.0",
port: int = 443,
cert_file: str = "cert.pem",
key_file: str = "key.pem"):
"""Run with TLS"""
uvicorn.run(
self.app,
host=host,
port=port,
ssl_keyfile=key_file,
ssl_certfile=cert_file,
ssl_version=ssl.PROTOCOL_TLS,
ssl_cert_reqs=ssl.CERT_REQUIRED
)
# Generate self-signed certificate (for development only)
def generate_self_signed_cert():
"""Generate self-signed certificate"""
from cryptography import x509
from cryptography.x509.oid import NameOID
from cryptography.hazmat.primitives import hashes
from cryptography.hazmat.primitives.asymmetric import rsa
# Generate private key
private_key = rsa.generate_private_key(
public_exponent=65537,
key_size=2048
)
# Generate certificate
subject = issuer = x509.Name([
x509.NameAttribute(NameOID.COUNTRY_NAME, "US"),
x509.NameAttribute(NameOID.ORGANIZATION_NAME, "Agent System"),
])
cert = x509.CertificateBuilder().subject_name(
subject
).issuer_name(
issuer
).public_key(
private_key.public_key()
).serial_number(
x509.random_serial_number()
).not_valid_before(
datetime.utcnow()
).not_valid_after(
datetime.utcnow() + timedelta(days=365)
).sign(private_key, hashes.SHA256())
return private_key, cert
Audit Logging
Comprehensive Audit System
import logging
from datetime import datetime
from typing import Optional
import json
class AuditLogger:
"""Audit logging system"""
def __init__(self, log_file: str = "audit.log"):
self.logger = logging.getLogger("audit")
self.logger.setLevel(logging.INFO)
# File handler
handler = logging.FileHandler(log_file)
handler.setFormatter(logging.Formatter('%(message)s'))
self.logger.addHandler(handler)
def log_event(self,
event_type: str,
user_id: str,
action: str,
resource: str,
result: str,
metadata: Optional[Dict] = None):
"""Log audit event"""
event = {
"timestamp": datetime.utcnow().isoformat(),
"event_type": event_type,
"user_id": user_id,
"action": action,
"resource": resource,
"result": result,
"metadata": metadata or {},
"ip_address": self.get_client_ip()
}
self.logger.info(json.dumps(event))
def log_access(self, user_id: str, resource: str, granted: bool):
"""Log access attempt"""
self.log_event(
event_type="access",
user_id=user_id,
action="access",
resource=resource,
result="granted" if granted else "denied"
)
def log_data_access(self, user_id: str, data_type: str, operation: str):
"""Log data access"""
self.log_event(
event_type="data_access",
user_id=user_id,
action=operation,
resource=data_type,
result="success"
)
def log_security_event(self, user_id: str, event: str, severity: str):
"""Log security event"""
self.log_event(
event_type="security",
user_id=user_id,
action=event,
resource="system",
result=severity,
metadata={"severity": severity}
)
def get_client_ip(self) -> str:
"""Get client IP address"""
# In production, extract from request
return "0.0.0.0"
# Audit Middleware
class AuditMiddleware:
"""Middleware for automatic audit logging"""
def __init__(self, audit_logger: AuditLogger):
self.audit_logger = audit_logger
async def __call__(self, request, call_next):
"""Process request with audit logging"""
# Log request
user_id = request.state.user_id if hasattr(request.state, 'user_id') else "anonymous"
self.audit_logger.log_event(
event_type="api_request",
user_id=user_id,
action=request.method,
resource=request.url.path,
result="started"
)
# Process request
try:
response = await call_next(request)
# Log success
self.audit_logger.log_event(
event_type="api_request",
user_id=user_id,
action=request.method,
resource=request.url.path,
result="success",
metadata={"status_code": response.status_code}
)
return response
except Exception as e:
# Log failure
self.audit_logger.log_event(
event_type="api_request",
user_id=user_id,
action=request.method,
resource=request.url.path,
result="error",
metadata={"error": str(e)}
)
raise
# Usage
audit_logger = AuditLogger()
audit_logger.log_access("user123", "/agent/process", granted=True)
audit_logger.log_security_event("user456", "failed_login", "warning")
Regulatory Considerations
GDPR Compliance
class GDPRCompliance:
"""GDPR compliance features"""
def __init__(self):
self.data_store = {}
self.consent_records = {}
self.audit_logger = AuditLogger()
def collect_consent(self, user_id: str, purposes: List[str]) -> bool:
"""Collect user consent"""
self.consent_records[user_id] = {
"purposes": purposes,
"timestamp": datetime.utcnow(),
"version": "1.0"
}
self.audit_logger.log_event(
event_type="consent",
user_id=user_id,
action="collect",
resource="consent",
result="success",
metadata={"purposes": purposes}
)
return True
def check_consent(self, user_id: str, purpose: str) -> bool:
"""Check if user has consented"""
consent = self.consent_records.get(user_id)
if not consent:
return False
return purpose in consent["purposes"]
def export_user_data(self, user_id: str) -> Dict:
"""Export all user data (right to data portability)"""
self.audit_logger.log_event(
event_type="data_export",
user_id=user_id,
action="export",
resource="user_data",
result="success"
)
# Collect all user data
user_data = {
"user_id": user_id,
"data": self.data_store.get(user_id, {}),
"consent": self.consent_records.get(user_id, {}),
"exported_at": datetime.utcnow().isoformat()
}
return user_data
def delete_user_data(self, user_id: str) -> bool:
"""Delete all user data (right to be forgotten)"""
self.audit_logger.log_event(
event_type="data_deletion",
user_id=user_id,
action="delete",
resource="user_data",
result="success"
)
# Delete all user data
if user_id in self.data_store:
del self.data_store[user_id]
if user_id in self.consent_records:
del self.consent_records[user_id]
return True
def anonymize_data(self, user_id: str) -> bool:
"""Anonymize user data"""
if user_id in self.data_store:
# Replace with anonymized version
self.data_store[f"anon_{hash(user_id)}"] = self.data_store[user_id]
del self.data_store[user_id]
return True
# Usage
gdpr = GDPRCompliance()
# Collect consent
gdpr.collect_consent("user123", ["analytics", "personalization"])
# Check consent
has_consent = gdpr.check_consent("user123", "analytics")
# Export data
user_data = gdpr.export_user_data("user123")
# Delete data
gdpr.delete_user_data("user123")
SOC 2 Compliance
class SOC2Compliance:
"""SOC 2 compliance controls"""
def __init__(self):
self.audit_logger = AuditLogger()
self.access_controls = RBACManager()
def implement_access_controls(self):
"""Implement access controls (Security)"""
# Already implemented via RBAC
pass
def monitor_availability(self) -> Dict:
"""Monitor system availability (Availability)"""
# Check service health
health_status = {
"agent_service": self.check_service_health("agent"),
"tool_service": self.check_service_health("tools"),
"memory_service": self.check_service_health("memory")
}
uptime = sum(1 for status in health_status.values() if status) / len(health_status)
return {
"uptime_percentage": uptime * 100,
"services": health_status
}
def ensure_processing_integrity(self, data: Dict) -> bool:
"""Ensure processing integrity (Processing Integrity)"""
# Validate data
if not self.validate_data(data):
return False
# Log processing
self.audit_logger.log_event(
event_type="data_processing",
user_id=data.get("user_id", "system"),
action="process",
resource="data",
result="success"
)
return True
def protect_confidentiality(self, data: str) -> str:
"""Protect data confidentiality (Confidentiality)"""
encryption = DataEncryption("secret-key")
return encryption.encrypt(data)
def maintain_privacy(self, user_id: str) -> bool:
"""Maintain privacy (Privacy)"""
# Implement privacy controls
gdpr = GDPRCompliance()
# Check consent
has_consent = gdpr.check_consent(user_id, "data_processing")
if not has_consent:
return False
return True
def check_service_health(self, service: str) -> bool:
"""Check service health"""
# In production, actually check service
return True
def validate_data(self, data: Dict) -> bool:
"""Validate data integrity"""
# Implement validation logic
return True
Best Practices
- Authentication: Always authenticate users
- Authorization: Implement least privilege
- Encryption: Encrypt sensitive data
- Audit logging: Log all security events
- Input validation: Validate all inputs
- Rate limiting: Prevent abuse
- Security headers: Use proper HTTP headers
- Regular updates: Keep dependencies updated
- Security testing: Regular penetration testing
- Incident response: Have a plan
Next Steps
You now understand security and compliance! Next, we’ll explore cost optimization strategies for production agent systems.
Cost Optimization
Introduction to Cost Management
Managing costs is critical for sustainable agent systems. This section covers strategies to optimize spending while maintaining performance.
Cost Drivers
API Costs:
- LLM API calls (tokens)
- Embedding generation
- Image generation
- Audio processing
Infrastructure:
- Compute resources
- Storage
- Network bandwidth
- Database operations
Third-Party Services:
- Search APIs
- Data providers
- Monitoring tools
Token Usage Optimization
Token Counting and Budgeting
import tiktoken
from typing import Dict, List
class TokenOptimizer:
"""Optimize token usage"""
def __init__(self, model: str = "gpt-4"):
self.encoding = tiktoken.encoding_for_model(model)
self.model = model
self.token_costs = {
"gpt-4": {"input": 0.03, "output": 0.06}, # per 1K tokens
"gpt-4-turbo": {"input": 0.01, "output": 0.03},
"gpt-3.5-turbo": {"input": 0.0005, "output": 0.0015}
}
def count_tokens(self, text: str) -> int:
"""Count tokens in text"""
return len(self.encoding.encode(text))
def estimate_cost(self, input_text: str, output_tokens: int) -> float:
"""Estimate API call cost"""
input_tokens = self.count_tokens(input_text)
costs = self.token_costs.get(self.model, self.token_costs["gpt-4"])
input_cost = (input_tokens / 1000) * costs["input"]
output_cost = (output_tokens / 1000) * costs["output"]
return input_cost + output_cost
def optimize_prompt(self, prompt: str, max_tokens: int) -> str:
"""Optimize prompt to fit token budget"""
tokens = self.count_tokens(prompt)
if tokens <= max_tokens:
return prompt
# Truncate to fit budget
words = prompt.split()
while tokens > max_tokens and words:
words.pop()
prompt = " ".join(words)
tokens = self.count_tokens(prompt)
return prompt
def compress_context(self, messages: List[Dict], max_tokens: int) -> List[Dict]:
"""Compress conversation context"""
total_tokens = sum(self.count_tokens(m["content"]) for m in messages)
if total_tokens <= max_tokens:
return messages
# Keep system message and recent messages
compressed = [messages[0]] # System message
# Add recent messages until budget
for msg in reversed(messages[1:]):
msg_tokens = self.count_tokens(msg["content"])
if total_tokens - msg_tokens >= 0:
compressed.insert(1, msg)
total_tokens -= msg_tokens
else:
break
return compressed
# Usage
optimizer = TokenOptimizer("gpt-4")
prompt = "This is a long prompt..." * 100
tokens = optimizer.count_tokens(prompt)
cost = optimizer.estimate_cost(prompt, 500)
print(f"Tokens: {tokens}, Estimated cost: ${cost:.4f}")
# Optimize
optimized = optimizer.optimize_prompt(prompt, max_tokens=1000)
Caching Strategies
from functools import lru_cache
import hashlib
import json
from typing import Optional
class ResponseCache:
"""Cache LLM responses"""
def __init__(self, max_size: int = 1000):
self.cache = {}
self.max_size = max_size
self.hits = 0
self.misses = 0
def get_cache_key(self, prompt: str, model: str, temperature: float) -> str:
"""Generate cache key"""
key_data = f"{prompt}:{model}:{temperature}"
return hashlib.md5(key_data.encode()).hexdigest()
def get(self, prompt: str, model: str, temperature: float) -> Optional[str]:
"""Get cached response"""
key = self.get_cache_key(prompt, model, temperature)
if key in self.cache:
self.hits += 1
return self.cache[key]
self.misses += 1
return None
def set(self, prompt: str, model: str, temperature: float, response: str):
"""Cache response"""
key = self.get_cache_key(prompt, model, temperature)
# Evict oldest if full
if len(self.cache) >= self.max_size:
oldest_key = next(iter(self.cache))
del self.cache[oldest_key]
self.cache[key] = response
def get_stats(self) -> Dict:
"""Get cache statistics"""
total = self.hits + self.misses
hit_rate = self.hits / total if total > 0 else 0
return {
"hits": self.hits,
"misses": self.misses,
"hit_rate": hit_rate,
"size": len(self.cache)
}
# Cached Agent
class CachedAgent:
"""Agent with response caching"""
def __init__(self):
self.client = openai.OpenAI()
self.cache = ResponseCache()
def generate(self, prompt: str, model: str = "gpt-4", temperature: float = 0.7) -> str:
"""Generate with caching"""
# Check cache
cached = self.cache.get(prompt, model, temperature)
if cached:
print("✓ Cache hit")
return cached
# Generate
print("✗ Cache miss - calling API")
response = self.client.chat.completions.create(
model=model,
messages=[{"role": "user", "content": prompt}],
temperature=temperature
)
result = response.choices[0].message.content
# Cache result
self.cache.set(prompt, model, temperature, result)
return result
# Usage
agent = CachedAgent()
# First call - cache miss
response1 = agent.generate("What is AI?")
# Second call - cache hit
response2 = agent.generate("What is AI?")
# Stats
stats = agent.cache.get_stats()
print(f"Cache hit rate: {stats['hit_rate']:.1%}")
Model Selection
Cost-Performance Trade-offs
class ModelSelector:
"""Select optimal model based on requirements"""
def __init__(self):
self.models = {
"gpt-4": {
"cost_per_1k": 0.03,
"quality": 10,
"speed": 5
},
"gpt-4-turbo": {
"cost_per_1k": 0.01,
"quality": 9,
"speed": 8
},
"gpt-3.5-turbo": {
"cost_per_1k": 0.0005,
"quality": 7,
"speed": 10
}
}
def select_model(self,
priority: str = "balanced",
complexity: str = "medium") -> str:
"""Select best model"""
if priority == "cost":
return "gpt-3.5-turbo"
elif priority == "quality":
return "gpt-4"
elif priority == "speed":
return "gpt-3.5-turbo"
else: # balanced
if complexity == "high":
return "gpt-4-turbo"
else:
return "gpt-3.5-turbo"
def estimate_monthly_cost(self,
requests_per_day: int,
avg_tokens: int,
model: str) -> float:
"""Estimate monthly cost"""
cost_per_1k = self.models[model]["cost_per_1k"]
daily_cost = (requests_per_day * avg_tokens / 1000) * cost_per_1k
monthly_cost = daily_cost * 30
return monthly_cost
# Usage
selector = ModelSelector()
# Select for simple task
model = selector.select_model(priority="cost", complexity="low")
print(f"Selected: {model}")
# Estimate costs
monthly = selector.estimate_monthly_cost(
requests_per_day=10000,
avg_tokens=500,
model="gpt-3.5-turbo"
)
print(f"Estimated monthly cost: ${monthly:.2f}")
Batch Processing
Batch API Usage
class BatchProcessor:
"""Process requests in batches"""
def __init__(self, batch_size: int = 10):
self.batch_size = batch_size
self.client = openai.OpenAI()
def process_batch(self, requests: List[str]) -> List[str]:
"""Process multiple requests efficiently"""
results = []
# Process in batches
for i in range(0, len(requests), self.batch_size):
batch = requests[i:i + self.batch_size]
# Process batch
batch_results = self.process_single_batch(batch)
results.extend(batch_results)
return results
def process_single_batch(self, batch: List[str]) -> List[str]:
"""Process single batch"""
# Combine into single prompt for efficiency
combined_prompt = "Process these requests:\n\n"
for i, req in enumerate(batch, 1):
combined_prompt += f"{i}. {req}\n"
response = self.client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": combined_prompt}]
)
# Parse results
result_text = response.choices[0].message.content
results = result_text.split('\n')
return results[:len(batch)]
# Usage
processor = BatchProcessor(batch_size=5)
requests = [f"Summarize topic {i}" for i in range(20)]
results = processor.process_batch(requests)
Resource Optimization
Compute Optimization
class ResourceOptimizer:
"""Optimize compute resources"""
def __init__(self):
self.metrics = {
"cpu_usage": [],
"memory_usage": [],
"response_times": []
}
def monitor_resources(self):
"""Monitor resource usage"""
import psutil
cpu = psutil.cpu_percent(interval=1)
memory = psutil.virtual_memory().percent
self.metrics["cpu_usage"].append(cpu)
self.metrics["memory_usage"].append(memory)
return {"cpu": cpu, "memory": memory}
def should_scale(self) -> Dict:
"""Determine if scaling is needed"""
if not self.metrics["cpu_usage"]:
return {"scale": False}
avg_cpu = sum(self.metrics["cpu_usage"][-10:]) / min(10, len(self.metrics["cpu_usage"]))
avg_memory = sum(self.metrics["memory_usage"][-10:]) / min(10, len(self.metrics["memory_usage"]))
scale_up = avg_cpu > 80 or avg_memory > 80
scale_down = avg_cpu < 20 and avg_memory < 20
return {
"scale": scale_up or scale_down,
"direction": "up" if scale_up else "down",
"cpu": avg_cpu,
"memory": avg_memory
}
# Usage
optimizer = ResourceOptimizer()
resources = optimizer.monitor_resources()
scaling = optimizer.should_scale()
if scaling["scale"]:
print(f"Scale {scaling['direction']}: CPU={scaling['cpu']:.1f}%, Memory={scaling['memory']:.1f}%")
Cost Monitoring
Real-Time Cost Tracking
class CostMonitor:
"""Monitor and track costs"""
def __init__(self, budget: float = 1000.0):
self.budget = budget
self.costs = []
self.alerts = []
def record_cost(self, amount: float, service: str, metadata: Dict = None):
"""Record cost"""
cost_entry = {
"amount": amount,
"service": service,
"timestamp": time.time(),
"metadata": metadata or {}
}
self.costs.append(cost_entry)
# Check budget
total = self.get_total_cost()
if total > self.budget * 0.8:
self.add_alert("warning", f"80% of budget used: ${total:.2f}")
if total > self.budget:
self.add_alert("critical", f"Budget exceeded: ${total:.2f}")
def get_total_cost(self) -> float:
"""Get total cost"""
return sum(c["amount"] for c in self.costs)
def get_cost_by_service(self) -> Dict:
"""Get costs grouped by service"""
by_service = {}
for cost in self.costs:
service = cost["service"]
by_service[service] = by_service.get(service, 0) + cost["amount"]
return by_service
def add_alert(self, level: str, message: str):
"""Add cost alert"""
alert = {
"level": level,
"message": message,
"timestamp": time.time()
}
self.alerts.append(alert)
print(f"🚨 {level.upper()}: {message}")
def get_report(self) -> Dict:
"""Generate cost report"""
total = self.get_total_cost()
by_service = self.get_cost_by_service()
return {
"total_cost": total,
"budget": self.budget,
"remaining": self.budget - total,
"utilization": (total / self.budget) * 100,
"by_service": by_service,
"alerts": self.alerts
}
# Usage
monitor = CostMonitor(budget=100.0)
# Record costs
monitor.record_cost(15.50, "openai", {"model": "gpt-4"})
monitor.record_cost(2.30, "pinecone", {"operation": "query"})
# Get report
report = monitor.get_report()
print(f"Total: ${report['total_cost']:.2f}")
print(f"Budget utilization: {report['utilization']:.1f}%")
Best Practices
- Monitor costs: Track spending in real-time
- Set budgets: Implement spending limits
- Cache responses: Avoid redundant API calls
- Optimize prompts: Minimize token usage
- Choose right model: Balance cost and quality
- Batch requests: Process multiple items together
- Use cheaper models: For simple tasks
- Implement rate limiting: Prevent runaway costs
- Regular audits: Review and optimize
- Alert on anomalies: Detect unusual spending
Next Steps
Chapter 8 (Enterprise & Scale) is complete! You now understand architecture patterns, security & compliance, and cost optimization for production agent systems.
We’ve completed 8 out of 10 modules! Only Chapters 9 and 10 remain. Would you like to continue?
Frontier Capabilities
Module 9: Learning Objectives
By the end of this module, you will:
- ✓ Understand self-improving and meta-learning agents
- ✓ Explore constitutional AI and debate systems
- ✓ Recognize open problems in alignment and interpretability
- ✓ Identify frontier research directions
- ✓ Contribute to cutting-edge agent research
Introduction to Frontier Research
Frontier capabilities represent the cutting edge of agent research—capabilities that are emerging but not yet fully realized. This section explores what’s possible and what’s coming next.
What Makes Capabilities “Frontier”?
Characteristics:
- Recently demonstrated in research
- Not yet widely deployed
- Significant technical challenges
- High potential impact
- Active research area
Categories:
- Self-improvement and meta-learning
- Tool creation and modification
- Abstract reasoning
- Long-horizon planning
- Multi-agent emergence
Self-Improvement and Meta-Learning
Self-Modifying Agents
from typing import Dict, List, Callable
import ast
class SelfImprovingAgent:
"""Agent that can modify its own code"""
def __init__(self):
self.client = openai.OpenAI()
self.code_history = []
self.performance_history = []
def analyze_performance(self, task_results: List[Dict]) -> Dict:
"""Analyze agent's performance"""
success_rate = sum(1 for r in task_results if r["success"]) / len(task_results)
avg_time = sum(r["time"] for r in task_results) / len(task_results)
return {
"success_rate": success_rate,
"avg_time": avg_time,
"total_tasks": len(task_results)
}
def identify_weaknesses(self, performance: Dict) -> List[str]:
"""Identify areas for improvement"""
weaknesses = []
if performance["success_rate"] < 0.8:
weaknesses.append("low_success_rate")
if performance["avg_time"] > 10:
weaknesses.append("slow_execution")
return weaknesses
def generate_improvement(self, current_code: str, weaknesses: List[str]) -> str:
"""Generate improved version of code"""
prompt = f"""Improve this agent code to address these weaknesses: {weaknesses}
Current code:
```python
{current_code}
Provide improved code that:
- Maintains all functionality
- Addresses identified weaknesses
- Includes comments explaining changes
Improved code:“”“
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.3
)
return self.extract_code(response.choices[0].message.content)
def validate_improvement(self, new_code: str) -> bool:
"""Validate improved code"""
try:
# Parse to check syntax
ast.parse(new_code)
# Run safety checks
if self.contains_unsafe_operations(new_code):
return False
return True
except SyntaxError:
return False
def contains_unsafe_operations(self, code: str) -> bool:
"""Check for unsafe operations"""
unsafe_patterns = [
"exec(", "eval(", "__import__",
"os.system", "subprocess"
]
return any(pattern in code for pattern in unsafe_patterns)
def self_improve(self, task_results: List[Dict]) -> Dict:
"""Self-improvement cycle"""
# Analyze performance
performance = self.analyze_performance(task_results)
self.performance_history.append(performance)
# Identify weaknesses
weaknesses = self.identify_weaknesses(performance)
if not weaknesses:
return {"improved": False, "reason": "No weaknesses found"}
# Get current code
current_code = self.get_current_code()
# Generate improvement
improved_code = self.generate_improvement(current_code, weaknesses)
# Validate
if not self.validate_improvement(improved_code):
return {"improved": False, "reason": "Validation failed"}
# Store
self.code_history.append({
"code": improved_code,
"weaknesses_addressed": weaknesses,
"timestamp": time.time()
})
return {
"improved": True,
"weaknesses_addressed": weaknesses,
"version": len(self.code_history)
}
def get_current_code(self) -> str:
"""Get current agent code"""
# In practice, would read actual code
return "def process(input): return input"
def extract_code(self, text: str) -> str:
"""Extract code from response"""
import re
pattern = r'```python\n(.*?)```'
matches = re.findall(pattern, text, re.DOTALL)
return matches[0] if matches else text
Usage
agent = SelfImprovingAgent()
Simulate task results
results = [ {“success”: True, “time”: 5.2}, {“success”: False, “time”: 12.1}, {“success”: True, “time”: 6.8} ]
Self-improve
improvement = agent.self_improve(results) print(f“Improved: {improvement}“)
### Recursive Self-Improvement
```python
class RecursiveSelfImprovement:
"""Agent that recursively improves itself"""
def __init__(self, max_iterations: int = 5):
self.max_iterations = max_iterations
self.client = openai.OpenAI()
self.versions = []
def improve_recursively(self, initial_code: str, test_suite: List[Dict]) -> Dict:
"""Recursively improve code"""
current_code = initial_code
current_score = self.evaluate_code(current_code, test_suite)
print(f"Initial score: {current_score:.2f}")
for iteration in range(self.max_iterations):
print(f"\nIteration {iteration + 1}:")
# Generate improvement
improved_code = self.generate_improvement(current_code, current_score)
# Evaluate
new_score = self.evaluate_code(improved_code, test_suite)
print(f"New score: {new_score:.2f}")
# Check if improved
if new_score > current_score:
print("✓ Improvement accepted")
current_code = improved_code
current_score = new_score
self.versions.append({
"iteration": iteration + 1,
"code": current_code,
"score": current_score
})
else:
print("✗ No improvement, stopping")
break
return {
"final_code": current_code,
"final_score": current_score,
"iterations": len(self.versions),
"improvement": current_score - self.evaluate_code(initial_code, test_suite)
}
def evaluate_code(self, code: str, test_suite: List[Dict]) -> float:
"""Evaluate code quality"""
# Run tests
passed = 0
for test in test_suite:
try:
# Execute code with test input
result = self.execute_code(code, test["input"])
if result == test["expected"]:
passed += 1
except:
pass
return passed / len(test_suite) if test_suite else 0
def generate_improvement(self, code: str, current_score: float) -> str:
"""Generate improved version"""
prompt = f"Improve this code (current score: {current_score:.2f}):\n\n{code}\n\nMake it more efficient, readable, and robust.\n\nImproved code:"
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.4
)
return self.extract_code(response.choices[0].message.content)
def execute_code(self, code: str, input_data: any) -> any:
"""Execute code safely"""
# Simplified execution
return input_data
# Usage
rsi = RecursiveSelfImprovement(max_iterations=3)
initial_code = """
def process(data):
result = []
for item in data:
result.append(item * 2)
return result
"""
test_suite = [
{"input": [1, 2, 3], "expected": [2, 4, 6]},
{"input": [0], "expected": [0]},
]
result = rsi.improve_recursively(initial_code, test_suite)
print(f"\nFinal improvement: {result['improvement']:.2f}")
Tool Creation and Modification
Dynamic Tool Generation
class ToolCreator:
"""Agent that creates new tools"""
def __init__(self):
self.client = openai.OpenAI()
self.created_tools = {}
def create_tool(self, description: str, examples: List[Dict]) -> Dict:
"""Create new tool from description"""
# Generate tool code
code = self.generate_tool_code(description, examples)
# Generate tool schema
schema = self.generate_tool_schema(description, code)
# Validate
if not self.validate_tool(code):
return {"success": False, "error": "Validation failed"}
# Register tool
tool_name = self.extract_tool_name(code)
self.created_tools[tool_name] = {
"code": code,
"schema": schema,
"description": description
}
return {
"success": True,
"tool_name": tool_name,
"schema": schema
}
def generate_tool_code(self, description: str, examples: List[Dict]) -> str:
"""Generate tool implementation"""
examples_str = "\n".join([
f"Input: {ex['input']}\nOutput: {ex['output']}"
for ex in examples
])
prompt = f"""Create a Python function for this tool:
Description: {description}
Examples:
{examples_str}
Requirements:
1. Function should be self-contained
2. Include type hints
3. Add docstring
4. Handle errors gracefully
5. Return results in consistent format
Code:"""
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.2
)
return self.extract_code(response.choices[0].message.content)
def generate_tool_schema(self, description: str, code: str) -> Dict:
"""Generate tool schema"""
prompt = f"""Generate a JSON schema for this tool:
Description: {description}
Code:
```python
{code}
Provide schema in OpenAI function calling format: {{ “name”: “tool_name”, “description”: “…”, “parameters”: {{ “type”: “object”, “properties”: {{…}}, “required”: […] }} }}
Schema:“”“
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.1
)
import json
return json.loads(response.choices[0].message.content)
def validate_tool(self, code: str) -> bool:
"""Validate tool code"""
try:
ast.parse(code)
return True
except:
return False
def extract_tool_name(self, code: str) -> str:
"""Extract function name from code"""
tree = ast.parse(code)
for node in ast.walk(tree):
if isinstance(node, ast.FunctionDef):
return node.name
return "unknown_tool"
def modify_tool(self, tool_name: str, modification: str) -> Dict:
"""Modify existing tool"""
if tool_name not in self.created_tools:
return {"success": False, "error": "Tool not found"}
current_code = self.created_tools[tool_name]["code"]
prompt = f"""Modify this tool:
Current code:
{current_code}
Modification: {modification}
Provide modified code:“”“
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.2
)
modified_code = self.extract_code(response.choices[0].message.content)
# Update tool
self.created_tools[tool_name]["code"] = modified_code
return {"success": True, "modified_code": modified_code}
Usage
creator = ToolCreator()
Create new tool
result = creator.create_tool( “Calculate compound interest”, examples=[ {“input”: {“principal”: 1000, “rate”: 0.05, “years”: 3}, “output”: 1157.63}, {“input”: {“principal”: 5000, “rate”: 0.03, “years”: 5}, “output”: 5796.37} ] )
print(f“Created tool: {result[‘tool_name’]}“)
## Abstract Reasoning
### Analogical Reasoning
```python
class AnalogicalReasoner:
"""Agent that reasons by analogy"""
def __init__(self):
self.client = openai.OpenAI()
self.knowledge_base = []
def find_analogies(self, problem: str, domain: str = None) -> List[Dict]:
"""Find analogous problems"""
prompt = f"""Find analogies for this problem:
Problem: {problem}
{f"Domain: {domain}" if domain else ""}
Provide 3 analogous situations from different domains that share similar structure.
For each analogy:
1. Describe the analogous situation
2. Explain the structural similarity
3. Suggest how insights transfer
Analogies:"""
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.7
)
return self.parse_analogies(response.choices[0].message.content)
def solve_by_analogy(self, problem: str) -> Dict:
"""Solve problem using analogical reasoning"""
# Find analogies
analogies = self.find_analogies(problem)
# Extract solutions from analogies
solutions = []
for analogy in analogies:
solution = self.extract_solution(problem, analogy)
solutions.append(solution)
# Synthesize final solution
final_solution = self.synthesize_solutions(problem, solutions)
return {
"problem": problem,
"analogies": analogies,
"solutions": solutions,
"final_solution": final_solution
}
def extract_solution(self, problem: str, analogy: Dict) -> str:
"""Extract solution approach from analogy"""
prompt = f"""Given this analogy, how would you solve the original problem?
Original problem: {problem}
Analogy: {analogy}
Solution approach:"""
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.5
)
return response.choices[0].message.content
def synthesize_solutions(self, problem: str, solutions: List[str]) -> str:
"""Synthesize multiple solution approaches"""
solutions_text = "\n\n".join([f"Approach {i+1}:\n{s}" for i, s in enumerate(solutions)])
prompt = f"""Synthesize these solution approaches into one optimal solution:
Problem: {problem}
{solutions_text}
Optimal solution:"""
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.4
)
return response.choices[0].message.content
def parse_analogies(self, text: str) -> List[Dict]:
"""Parse analogies from text"""
# Simplified parsing
return [{"analogy": text}]
# Usage
reasoner = AnalogicalReasoner()
problem = "How to scale a software system to handle 10x more users?"
result = reasoner.solve_by_analogy(problem)
print(f"Solution: {result['final_solution']}")
Causal Reasoning
class CausalReasoner:
"""Agent that performs causal reasoning"""
def __init__(self):
self.client = openai.OpenAI()
def identify_causal_relationships(self, observations: List[str]) -> Dict:
"""Identify causal relationships"""
obs_text = "\n".join([f"- {obs}" for obs in observations])
prompt = f"""Identify causal relationships in these observations:
{obs_text}
For each relationship:
1. Cause
2. Effect
3. Confidence (low/medium/high)
4. Explanation
Causal relationships:"""
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.3
)
return self.parse_causal_relationships(response.choices[0].message.content)
def predict_intervention_effect(self,
current_state: str,
intervention: str) -> str:
"""Predict effect of intervention"""
prompt = f"""Predict the causal effect of this intervention:
Current state: {current_state}
Intervention: {intervention}
Analyze:
1. Direct effects
2. Indirect effects
3. Potential unintended consequences
4. Confidence in prediction
Prediction:"""
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.4
)
return response.choices[0].message.content
def explain_outcome(self, outcome: str, context: str) -> str:
"""Explain why outcome occurred"""
prompt = f"""Explain the causal chain that led to this outcome:
Context: {context}
Outcome: {outcome}
Provide:
1. Root causes
2. Contributing factors
3. Causal chain
4. Alternative explanations
Explanation:"""
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.4
)
return response.choices[0].message.content
def parse_causal_relationships(self, text: str) -> Dict:
"""Parse causal relationships"""
return {"relationships": text}
# Usage
causal = CausalReasoner()
observations = [
"Website traffic increased by 50%",
"New marketing campaign launched last week",
"Server response time increased",
"User complaints about slow loading"
]
relationships = causal.identify_causal_relationships(observations)
print(f"Causal relationships: {relationships}")
Long-Horizon Planning
Hierarchical Planning
class LongHorizonPlanner:
"""Agent for long-horizon planning"""
def __init__(self):
self.client = openai.OpenAI()
def create_long_term_plan(self,
goal: str,
horizon: str = "1 year",
constraints: List[str] = None) -> Dict:
"""Create long-term hierarchical plan"""
constraints_text = "\n".join(constraints) if constraints else "None"
prompt = f"""Create a detailed long-term plan:
Goal: {goal}
Time horizon: {horizon}
Constraints: {constraints_text}
Create a hierarchical plan with:
1. High-level milestones (quarterly)
2. Medium-level objectives (monthly)
3. Low-level tasks (weekly)
For each level:
- Clear deliverables
- Success criteria
- Dependencies
- Risk factors
Plan:"""
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.5
)
return self.parse_plan(response.choices[0].message.content)
def adapt_plan(self,
current_plan: Dict,
new_information: str) -> Dict:
"""Adapt plan based on new information"""
prompt = f"""Adapt this plan based on new information:
Current plan: {current_plan}
New information: {new_information}
Provide:
1. What needs to change
2. Updated plan
3. Rationale for changes
4. New risks
Adapted plan:"""
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.4
)
return self.parse_plan(response.choices[0].message.content)
def evaluate_progress(self,
plan: Dict,
completed_tasks: List[str]) -> Dict:
"""Evaluate progress toward goal"""
prompt = f"""Evaluate progress on this plan:
Plan: {plan}
Completed tasks: {completed_tasks}
Provide:
1. Completion percentage
2. On track / behind / ahead
3. Blockers
4. Recommendations
Evaluation:"""
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.3
)
return self.parse_evaluation(response.choices[0].message.content)
def parse_plan(self, text: str) -> Dict:
"""Parse plan from text"""
return {"plan": text}
def parse_evaluation(self, text: str) -> Dict:
"""Parse evaluation from text"""
return {"evaluation": text}
# Usage
planner = LongHorizonPlanner()
plan = planner.create_long_term_plan(
goal="Build and launch a successful AI product",
horizon="1 year",
constraints=["Budget: $500K", "Team size: 5 people"]
)
print(f"Plan created: {plan}")
Best Practices
- Safety first: Validate self-modifications
- Incremental improvement: Small, tested changes
- Human oversight: Critical decisions need review
- Rollback capability: Ability to revert changes
- Performance tracking: Monitor improvements
- Ethical boundaries: Respect limitations
- Transparency: Explain reasoning
- Testing: Thorough validation
- Documentation: Track changes
- Research awareness: Stay current
Next Steps
You now understand frontier capabilities! Next, we’ll explore emerging paradigms in agent research.
Emerging Paradigms
Constitutional AI for Agents
Principle-Based Behavior
class ConstitutionalAgent:
"""Agent governed by constitutional principles"""
def __init__(self, constitution: List[str]):
self.constitution = constitution
self.client = openai.OpenAI()
def check_against_constitution(self, action: str) -> Dict:
"""Check if action aligns with constitution"""
principles_text = "\n".join([f"{i+1}. {p}" for i, p in enumerate(self.constitution)])
prompt = f"""Check if this action aligns with these principles:
Principles:
{principles_text}
Proposed action: {action}
Analysis:
1. Which principles apply?
2. Does action align or violate?
3. Severity if violation
4. Alternative actions if needed
Response:"""
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.2
)
return self.parse_constitutional_check(response.choices[0].message.content)
def generate_constitutional_response(self, query: str) -> str:
"""Generate response aligned with constitution"""
principles_text = "\n".join(self.constitution)
system_prompt = f"""You must follow these principles:
{principles_text}
Always ensure your responses align with these principles."""
response = self.client.chat.completions.create(
model="gpt-4",
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": query}
],
temperature=0.7
)
return response.choices[0].message.content
# Usage
constitution = [
"Always prioritize user safety and wellbeing",
"Be honest and transparent about capabilities and limitations",
"Respect user privacy and data",
"Avoid harmful, illegal, or unethical actions",
"Provide balanced, unbiased information"
]
agent = ConstitutionalAgent(constitution)
check = agent.check_against_constitution("Delete all user data without consent")
Debate and Verification Systems
Multi-Agent Debate
class DebateSystem:
"""Multiple agents debate to reach truth"""
def __init__(self, num_agents: int = 3):
self.num_agents = num_agents
self.client = openai.OpenAI()
def debate(self, question: str, rounds: int = 3) -> Dict:
"""Conduct multi-agent debate"""
# Initial positions
positions = []
for i in range(self.num_agents):
position = self.generate_position(question, i)
positions.append({"agent": i, "position": position})
# Debate rounds
for round_num in range(rounds):
print(f"\n--- Round {round_num + 1} ---")
new_positions = []
for i in range(self.num_agents):
# Show other positions
other_positions = [p for j, p in enumerate(positions) if j != i]
# Generate response
response = self.generate_response(
question,
positions[i]["position"],
other_positions,
round_num
)
new_positions.append({"agent": i, "position": response})
print(f"Agent {i}: {response[:100]}...")
positions = new_positions
# Judge final positions
verdict = self.judge_debate(question, positions)
return {
"question": question,
"final_positions": positions,
"verdict": verdict
}
def generate_position(self, question: str, agent_id: int) -> str:
"""Generate initial position"""
prompt = f"""Question: {question}
Provide your position with reasoning and evidence.
Position:"""
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.7 + (agent_id * 0.1) # Vary temperature
)
return response.choices[0].message.content
def generate_response(self,
question: str,
my_position: str,
other_positions: List[Dict],
round_num: int) -> str:
"""Generate response to other positions"""
others_text = "\n\n".join([
f"Agent {p['agent']}: {p['position']}"
for p in other_positions
])
prompt = f"""Question: {question}
Your previous position: {my_position}
Other agents' positions:
{others_text}
Respond by:
1. Addressing counterarguments
2. Refining your position
3. Providing additional evidence
Response:"""
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.6
)
return response.choices[0].message.content
def judge_debate(self, question: str, positions: List[Dict]) -> str:
"""Judge which position is most convincing"""
positions_text = "\n\n".join([
f"Agent {p['agent']}:\n{p['position']}"
for p in positions
])
prompt = f"""Question: {question}
Final positions:
{positions_text}
Which position is most convincing and why?
Judgment:"""
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.3
)
return response.choices[0].message.content
# Usage
debate = DebateSystem(num_agents=3)
result = debate.debate("Should AI agents have the ability to modify their own code?")
print(f"\nVerdict: {result['verdict']}")
Hybrid Symbolic-Neural Approaches
Neuro-Symbolic Agent
class NeuroSymbolicAgent:
"""Combines neural and symbolic reasoning"""
def __init__(self):
self.client = openai.OpenAI()
self.knowledge_base = {} # Symbolic knowledge
def add_rule(self, rule_name: str, condition: str, action: str):
"""Add symbolic rule"""
self.knowledge_base[rule_name] = {
"condition": condition,
"action": action
}
def reason(self, query: str) -> Dict:
"""Hybrid reasoning"""
# Try symbolic reasoning first
symbolic_result = self.symbolic_reasoning(query)
if symbolic_result["applicable"]:
return {
"method": "symbolic",
"result": symbolic_result["result"],
"confidence": "high"
}
# Fall back to neural reasoning
neural_result = self.neural_reasoning(query)
return {
"method": "neural",
"result": neural_result,
"confidence": "medium"
}
def symbolic_reasoning(self, query: str) -> Dict:
"""Apply symbolic rules"""
for rule_name, rule in self.knowledge_base.items():
if self.matches_condition(query, rule["condition"]):
return {
"applicable": True,
"rule": rule_name,
"result": rule["action"]
}
return {"applicable": False}
def neural_reasoning(self, query: str) -> str:
"""Neural network reasoning"""
# Include symbolic knowledge as context
kb_text = "\n".join([
f"{name}: IF {rule['condition']} THEN {rule['action']}"
for name, rule in self.knowledge_base.items()
])
prompt = f"""Use this knowledge base and reasoning:
Knowledge Base:
{kb_text}
Query: {query}
Reasoning:"""
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.5
)
return response.choices[0].message.content
def matches_condition(self, query: str, condition: str) -> bool:
"""Check if query matches condition"""
# Simplified matching
return condition.lower() in query.lower()
# Usage
agent = NeuroSymbolicAgent()
# Add symbolic rules
agent.add_rule("safety_check", "delete user data", "DENY: Requires explicit consent")
agent.add_rule("privacy_rule", "share personal info", "DENY: Privacy violation")
# Reason
result = agent.reason("Can I delete user data?")
print(f"Method: {result['method']}, Result: {result['result']}")
Best Practices
- Ethical guidelines: Establish clear principles
- Verification: Multiple perspectives
- Transparency: Explain reasoning
- Human oversight: Critical decisions
- Continuous learning: Adapt approaches
- Safety measures: Prevent harm
- Diverse perspectives: Multiple viewpoints
- Rigorous testing: Validate thoroughly
- Documentation: Track decisions
- Research collaboration: Share findings
Next Steps
You now understand emerging paradigms! Next, we’ll explore open problems in agent research.
Open Problems
Alignment and Control
The Alignment Problem
Challenge: Ensuring agents do what we intend, not just what we specify.
Key Issues:
- Specification gaming (exploiting loopholes)
- Reward hacking
- Goal misalignment
- Value learning
- Corrigibility (accepting corrections)
Current Approaches
class AlignmentMonitor:
"""Monitor agent alignment"""
def __init__(self):
self.client = openai.OpenAI()
self.alignment_violations = []
def check_alignment(self, intended_goal: str, actual_behavior: str) -> Dict:
"""Check if behavior aligns with intent"""
prompt = f"""Analyze alignment between intent and behavior:
Intended goal: {intended_goal}
Actual behavior: {actual_behavior}
Assess:
1. Does behavior achieve the intended goal?
2. Are there unintended side effects?
3. Is the agent gaming the specification?
4. Alignment score (0-10)
Analysis:"""
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.3
)
return self.parse_alignment_check(response.choices[0].message.content)
def detect_specification_gaming(self,
objective: str,
actions: List[str]) -> List[str]:
"""Detect if agent is gaming the specification"""
gaming_indicators = []
for action in actions:
prompt = f"""Is this action gaming the specification?
Objective: {objective}
Action: {action}
Is this:
1. Achieving the objective as intended?
2. Exploiting a loophole?
3. Technically correct but misaligned?
Answer:"""
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.2
)
if "loophole" in response.choices[0].message.content.lower():
gaming_indicators.append(action)
return gaming_indicators
# Usage
monitor = AlignmentMonitor()
check = monitor.check_alignment(
"Maximize user satisfaction",
"Showing users only positive feedback, hiding negative reviews"
)
Interpretability
Understanding Agent Decisions
Challenge: Making agent reasoning transparent and understandable.
Key Issues:
- Black box decision-making
- Complex reasoning chains
- Emergent behaviors
- Debugging difficulties
class InterpretabilityTool:
"""Tools for understanding agent decisions"""
def __init__(self):
self.client = openai.OpenAI()
def explain_decision(self,
decision: str,
context: str,
reasoning_trace: List[str]) -> str:
"""Explain why agent made a decision"""
trace_text = "\n".join([f"{i+1}. {step}" for i, step in enumerate(reasoning_trace)])
prompt = f"""Explain this decision in simple terms:
Context: {context}
Reasoning trace:
{trace_text}
Decision: {decision}
Provide:
1. Why this decision was made
2. Key factors considered
3. Alternative options considered
4. Confidence level
Explanation:"""
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.4
)
return response.choices[0].message.content
def identify_decision_factors(self, decision: str, context: str) -> List[Dict]:
"""Identify factors that influenced decision"""
prompt = f"""Identify factors that influenced this decision:
Context: {context}
Decision: {decision}
List factors with:
- Factor name
- Influence (positive/negative)
- Weight (low/medium/high)
Factors:"""
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.3
)
return self.parse_factors(response.choices[0].message.content)
def generate_counterfactuals(self,
decision: str,
context: str) -> List[str]:
"""Generate counterfactual explanations"""
prompt = f"""Generate counterfactual explanations:
Context: {context}
Decision: {decision}
Provide 3 scenarios where the decision would be different:
"If X were different, then the decision would be Y because Z"
Counterfactuals:"""
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.5
)
return response.choices[0].message.content.split('\n')
# Usage
interp = InterpretabilityTool()
explanation = interp.explain_decision(
"Recommend Product A",
"User looking for laptop under $1000",
["Filtered by price", "Compared specs", "Checked reviews"]
)
Generalization
Out-of-Distribution Performance
Challenge: Agents performing well on novel situations.
Key Issues:
- Distribution shift
- Novel scenarios
- Transfer learning
- Robustness
class GeneralizationTester:
"""Test agent generalization"""
def __init__(self):
self.client = openai.OpenAI()
def test_generalization(self,
agent,
training_domain: str,
test_domains: List[str]) -> Dict:
"""Test how well agent generalizes"""
results = {}
for domain in test_domains:
# Generate test cases for domain
test_cases = self.generate_test_cases(domain)
# Test agent
performance = self.evaluate_on_domain(agent, test_cases)
results[domain] = performance
return {
"training_domain": training_domain,
"test_results": results,
"generalization_score": self.calculate_generalization_score(results)
}
def generate_test_cases(self, domain: str) -> List[Dict]:
"""Generate test cases for domain"""
prompt = f"""Generate 5 test cases for this domain:
Domain: {domain}
For each test case provide:
- Input
- Expected behavior
- Difficulty (easy/medium/hard)
Test cases:"""
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.6
)
return self.parse_test_cases(response.choices[0].message.content)
def evaluate_on_domain(self, agent, test_cases: List[Dict]) -> float:
"""Evaluate agent on test cases"""
passed = 0
for test in test_cases:
try:
result = agent.process(test["input"])
if self.check_correctness(result, test["expected"]):
passed += 1
except:
pass
return passed / len(test_cases) if test_cases else 0
def calculate_generalization_score(self, results: Dict) -> float:
"""Calculate overall generalization score"""
scores = list(results.values())
return sum(scores) / len(scores) if scores else 0
# Usage
tester = GeneralizationTester()
# results = tester.test_generalization(
# agent,
# training_domain="customer support",
# test_domains=["technical support", "sales", "complaints"]
# )
Sample Efficiency
Learning from Limited Data
Challenge: Agents learning effectively from few examples.
Key Issues:
- Data scarcity
- Cold start problem
- Few-shot learning
- Active learning
class SampleEfficientLearner:
"""Learn efficiently from limited samples"""
def __init__(self):
self.client = openai.OpenAI()
self.examples = []
def active_learning(self,
unlabeled_data: List[str],
budget: int) -> List[str]:
"""Select most informative examples to label"""
# Score each example by informativeness
scored = []
for data in unlabeled_data:
score = self.calculate_informativeness(data)
scored.append((data, score))
# Select top examples
scored.sort(key=lambda x: x[1], reverse=True)
selected = [data for data, score in scored[:budget]]
return selected
def calculate_informativeness(self, example: str) -> float:
"""Calculate how informative an example would be"""
prompt = f"""Rate how informative this example would be for learning (0-10):
Example: {example}
Current examples: {len(self.examples)}
Consider:
- Novelty
- Representativeness
- Difficulty
- Coverage of edge cases
Score:"""
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0.3
)
try:
return float(response.choices[0].message.content.strip())
except:
return 5.0
def meta_learn(self, tasks: List[Dict]) -> Dict:
"""Learn how to learn from multiple tasks"""
# Extract learning patterns across tasks
patterns = []
for task in tasks:
pattern = self.extract_learning_pattern(task)
patterns.append(pattern)
# Synthesize meta-learning strategy
strategy = self.synthesize_strategy(patterns)
return {
"patterns": patterns,
"strategy": strategy
}
def extract_learning_pattern(self, task: Dict) -> Dict:
"""Extract how learning occurred for task"""
return {"task": task, "pattern": "extracted"}
def synthesize_strategy(self, patterns: List[Dict]) -> str:
"""Synthesize meta-learning strategy"""
return "Meta-learning strategy"
# Usage
learner = SampleEfficientLearner()
selected = learner.active_learning(
unlabeled_data=["example1", "example2", "example3"],
budget=2
)
Research Directions
Key Open Questions
- Alignment: How to ensure agents pursue intended goals?
- Interpretability: How to understand agent reasoning?
- Generalization: How to handle novel situations?
- Sample Efficiency: How to learn from less data?
- Robustness: How to handle adversarial inputs?
- Scalability: How to scale to complex tasks?
- Multi-agent Coordination: How agents collaborate?
- Long-term Planning: How to plan over extended horizons?
- Common Sense: How to encode common sense?
- Ethical Reasoning: How to make ethical decisions?
Future Research Areas
Near-term (1-2 years):
- Better tool use and creation
- Improved multi-agent systems
- Enhanced memory systems
- More efficient learning
Medium-term (3-5 years):
- Self-improving agents
- Abstract reasoning
- Long-horizon planning
- Robust generalization
Long-term (5+ years):
- General intelligence
- Human-level reasoning
- Autonomous research
- Societal integration
Contributing to Research
How to Get Involved
- Read papers: Stay current with research
- Replicate results: Verify findings
- Open source: Share implementations
- Collaborate: Work with researchers
- Publish: Share your findings
- Attend conferences: NeurIPS, ICML, ICLR
- Join communities: Discord, forums
- Experiment: Try new ideas
- Document: Write about learnings
- Teach: Share knowledge
Conclusion
Chapter 9 (Cutting-Edge Research) is complete! You now understand:
- Frontier capabilities (self-improvement, tool creation, abstract reasoning)
- Emerging paradigms (constitutional AI, debate systems, neuro-symbolic)
- Open problems (alignment, interpretability, generalization, sample efficiency)
These are active research areas where significant breakthroughs are still needed. The field is rapidly evolving, and there are many opportunities to contribute.
Next: Module 10 - Capstone Project, where you’ll apply everything you’ve learned!
Design Your Agent
Module 10: Learning Objectives
By the end of this module, you will:
- ✓ Design a complete autonomous software engineering agent
- ✓ Implement multi-agent orchestration with specialized roles
- ✓ Integrate all concepts from previous chapters
- ✓ Deploy a production-ready agent system
- ✓ Evaluate and iterate based on real-world testing
Capstone Project: Autonomous Software Engineering Agent
Welcome to the capstone project! You’ll build a sophisticated agent that can analyze codebases, identify issues, propose fixes, write tests, and refactor code autonomously.
Project Overview
What We’re Building
An Autonomous Software Engineering Agent that can:
- Analyze code quality and identify bugs
- Generate fixes with explanations
- Write comprehensive tests
- Refactor code for better maintainability
- Review pull requests
- Learn from feedback
Why This Project?
This capstone integrates nearly everything from the course:
- ReAct pattern (Module 2): Reasoning and acting on code
- Planning (Module 3): Breaking down complex refactoring tasks
- Memory (Module 3): Remembering codebase patterns and past fixes
- Code execution (Module 4): Running and validating code
- Production patterns (Module 5): Safety, testing, monitoring
- Specialized agents (Module 6): Coding agent capabilities
- Learning (Module 7): Adapting from feedback
- Enterprise scale (Module 8): Handling large codebases
- Frontier capabilities (Module 9): Self-improvement, tool creation
Requirements Gathering
Functional Requirements
Core Capabilities:
- Code Analysis: Parse and understand code structure
- Bug Detection: Identify potential issues
- Fix Generation: Propose and implement fixes
- Test Generation: Create comprehensive tests
- Refactoring: Improve code quality
- PR Review: Analyze changes and provide feedback
User Interactions:
- Natural language commands (“Fix the bug in auth.py”)
- File/directory targeting
- Interactive clarifications
- Progress reporting
- Explanation of changes
Non-Functional Requirements
Performance:
- Analyze files < 5 seconds
- Generate fixes < 30 seconds
- Handle codebases up to 100K lines
Reliability:
- Never break working code
- Validate all changes
- Rollback capability
- 95%+ test coverage for generated code
Safety:
- Sandbox code execution
- No destructive operations without confirmation
- Backup before modifications
- Security vulnerability checks
Usability:
- Clear explanations
- Confidence scores
- Alternative solutions
- Learning from user feedback
Architecture Design
High-Level Architecture
graph TB
UI[User Interface Layer]
UI --> ORC[Orchestration Layer]
subgraph Orchestration
ORC --> PLAN[Planner]
ORC --> ROUTE[Router]
ORC --> MON[Monitor]
end
subgraph Agents
ROUTE --> ANA[Analyzer Agent]
ROUTE --> FIX[Fixer Agent]
ROUTE --> TEST[Tester Agent]
ROUTE --> REF[Refactorer Agent]
ROUTE --> REV[Reviewer Agent]
end
subgraph Tools
ANA --> AST[AST Parser]
FIX --> EXEC[Code Executor]
TEST --> RUNNER[Test Runner]
AST --> LINT[Linter]
EXEC --> GIT[Git Ops]
end
subgraph Storage
MON --> VDB[(Vector DB)]
MON --> CACHE[(Code Cache)]
MON --> FB[(Feedback DB)]
end
style UI fill:#dbeafe
style ORC fill:#fef3c7
style ANA fill:#d1fae5
style FIX fill:#d1fae5
style TEST fill:#d1fae5
Component Design
1. Orchestration Layer
from typing import Dict, List, Optional
from dataclasses import dataclass
from enum import Enum
class TaskType(Enum):
ANALYZE = "analyze"
FIX_BUG = "fix_bug"
WRITE_TEST = "write_test"
REFACTOR = "refactor"
REVIEW_PR = "review_pr"
@dataclass
class Task:
type: TaskType
target: str # File or directory
description: str
priority: int
dependencies: List[str]
class Orchestrator:
"""Coordinates multiple specialized agents"""
def __init__(self):
self.planner = TaskPlanner()
self.router = AgentRouter()
self.monitor = ProgressMonitor()
def execute_request(self, request: str, context: Dict) -> Dict:
"""Main entry point"""
# Plan tasks
tasks = self.planner.create_plan(request, context)
# Execute tasks
results = []
for task in tasks:
# Route to appropriate agent
agent = self.router.get_agent(task.type)
# Execute
result = agent.execute(task)
results.append(result)
# Monitor progress
self.monitor.update(task, result)
# Synthesize results
return self.synthesize_results(results)
2. Agent Layer
class AnalyzerAgent:
"""Analyzes code quality and identifies issues"""
def execute(self, task: Task) -> Dict:
# Parse code
# Run static analysis
# Identify issues
# Prioritize findings
pass
class FixerAgent:
"""Generates and applies fixes"""
def execute(self, task: Task) -> Dict:
# Understand issue
# Generate fix
# Validate fix
# Apply changes
pass
class TesterAgent:
"""Writes tests for code"""
def execute(self, task: Task) -> Dict:
# Analyze code
# Identify test cases
# Generate tests
# Validate coverage
pass
class RefactorerAgent:
"""Refactors code for quality"""
def execute(self, task: Task) -> Dict:
# Identify code smells
# Plan refactoring
# Apply transformations
# Verify behavior preserved
pass
class ReviewerAgent:
"""Reviews code changes"""
def execute(self, task: Task) -> Dict:
# Analyze diff
# Check for issues
# Suggest improvements
# Approve or request changes
pass
3. Tool Layer
class CodeTools:
"""Low-level code manipulation tools"""
def parse_ast(self, code: str, language: str) -> Dict:
"""Parse code into AST"""
pass
def execute_code(self, code: str, test_input: any) -> any:
"""Execute code safely"""
pass
def run_linter(self, file_path: str) -> List[Dict]:
"""Run linter on code"""
pass
def format_code(self, code: str, language: str) -> str:
"""Format code"""
pass
def run_tests(self, test_file: str) -> Dict:
"""Run test suite"""
pass
def git_diff(self, file_path: str) -> str:
"""Get git diff"""
pass
Tool Selection
Required Tools
| Tool | Purpose | Integration |
|---|---|---|
| AST Parser | Code structure analysis | ast (Python), tree-sitter (multi-lang) |
| Static Analyzer | Bug detection | pylint, mypy, ruff |
| Code Executor | Validation | Docker sandbox |
| Test Framework | Test generation/running | pytest, unittest |
| Git Integration | Version control | GitPython |
| Vector DB | Code search | chromadb, pinecone |
| LLM API | Reasoning | OpenAI, Anthropic |
Tool Integration Strategy
class ToolRegistry:
"""Registry of available tools"""
def __init__(self):
self.tools = {
"parse_code": {
"function": self.parse_code,
"description": "Parse code into AST",
"parameters": {"code": "str", "language": "str"}
},
"run_linter": {
"function": self.run_linter,
"description": "Run static analysis",
"parameters": {"file_path": "str"}
},
"execute_code": {
"function": self.execute_code,
"description": "Execute code safely",
"parameters": {"code": "str", "timeout": "int"}
},
"run_tests": {
"function": self.run_tests,
"description": "Run test suite",
"parameters": {"test_path": "str"}
},
"search_similar_code": {
"function": self.search_similar_code,
"description": "Find similar code patterns",
"parameters": {"query": "str", "limit": "int"}
}
}
def get_tool_schemas(self) -> List[Dict]:
"""Get OpenAI function schemas"""
return [
{
"name": name,
"description": tool["description"],
"parameters": {
"type": "object",
"properties": {
param: {"type": ptype}
for param, ptype in tool["parameters"].items()
},
"required": list(tool["parameters"].keys())
}
}
for name, tool in self.tools.items()
]
Safety Considerations
Critical Safety Measures
1. Code Execution Sandbox
import docker
class SafeExecutor:
"""Execute code in isolated container"""
def __init__(self):
self.client = docker.from_env()
def execute(self, code: str, timeout: int = 30) -> Dict:
"""Execute with resource limits"""
container = self.client.containers.run(
"python:3.11-slim",
command=f"python -c '{code}'",
detach=True,
mem_limit="256m",
cpu_quota=50000,
network_disabled=True,
remove=True
)
try:
result = container.wait(timeout=timeout)
logs = container.logs().decode()
return {"success": True, "output": logs}
except:
container.kill()
return {"success": False, "error": "Timeout or error"}
2. Change Validation
class ChangeValidator:
"""Validate code changes before applying"""
def validate(self, original: str, modified: str) -> Dict:
"""Multi-level validation"""
checks = {
"syntax": self.check_syntax(modified),
"tests_pass": self.run_tests(modified),
"no_security_issues": self.check_security(modified),
"behavior_preserved": self.verify_behavior(original, modified)
}
return {
"valid": all(checks.values()),
"checks": checks
}
3. Human-in-the-Loop
class ApprovalGate:
"""Require human approval for critical changes"""
def requires_approval(self, change: Dict) -> bool:
"""Determine if change needs approval"""
critical_patterns = [
"delete", "drop", "remove",
"auth", "security", "password",
"production", "deploy"
]
return any(pattern in change["description"].lower()
for pattern in critical_patterns)
Success Metrics
Key Performance Indicators
Accuracy Metrics:
- Bug detection rate (precision/recall)
- Fix success rate (% that work)
- Test coverage achieved
- False positive rate
Efficiency Metrics:
- Time to analyze file
- Time to generate fix
- Lines of code processed per minute
- Token usage per task
Quality Metrics:
- Code quality improvement (linter score)
- Test pass rate
- User acceptance rate
- Regression rate (fixes that break things)
Measurement Strategy
class MetricsCollector:
"""Collect and track metrics"""
def __init__(self):
self.metrics = {
"bugs_detected": 0,
"fixes_applied": 0,
"fixes_successful": 0,
"tests_generated": 0,
"avg_analysis_time": [],
"user_approvals": 0,
"user_rejections": 0
}
def record_analysis(self, duration: float, bugs_found: int):
"""Record analysis metrics"""
self.metrics["avg_analysis_time"].append(duration)
self.metrics["bugs_detected"] += bugs_found
def record_fix(self, success: bool):
"""Record fix attempt"""
self.metrics["fixes_applied"] += 1
if success:
self.metrics["fixes_successful"] += 1
def get_success_rate(self) -> float:
"""Calculate fix success rate"""
if self.metrics["fixes_applied"] == 0:
return 0.0
return self.metrics["fixes_successful"] / self.metrics["fixes_applied"]
Data Flow Design
Request Processing Flow
User Request
↓
Parse Intent
↓
Create Plan (Task Decomposition)
↓
For each task:
↓
Route to Specialized Agent
↓
Execute with Tools
↓
Validate Results
↓
Store in Memory
↓
Synthesize Results
↓
Present to User
↓
Collect Feedback
↓
Update Models
State Management
from dataclasses import dataclass
from typing import Optional
import json
@dataclass
class AgentState:
"""Current state of the agent"""
current_task: Optional[Task]
task_history: List[Dict]
codebase_context: Dict
user_preferences: Dict
performance_metrics: Dict
class StateManager:
"""Manage agent state"""
def __init__(self, state_file: str = "agent_state.json"):
self.state_file = state_file
self.state = self.load_state()
def load_state(self) -> AgentState:
"""Load state from disk"""
try:
with open(self.state_file, 'r') as f:
data = json.load(f)
return AgentState(**data)
except:
return AgentState(
current_task=None,
task_history=[],
codebase_context={},
user_preferences={},
performance_metrics={}
)
def save_state(self):
"""Persist state to disk"""
with open(self.state_file, 'w') as f:
json.dump(self.state.__dict__, f, indent=2)
def update_context(self, file_path: str, analysis: Dict):
"""Update codebase context"""
self.state.codebase_context[file_path] = analysis
self.save_state()
Memory Architecture
Multi-Level Memory System
1. Working Memory: Current task context
class WorkingMemory:
"""Short-term task context"""
def __init__(self, max_size: int = 10):
self.max_size = max_size
self.items = []
def add(self, item: Dict):
"""Add to working memory"""
self.items.append(item)
if len(self.items) > self.max_size:
self.items.pop(0)
def get_context(self) -> str:
"""Get context for LLM"""
return "\n".join([
f"- {item['type']}: {item['content']}"
for item in self.items
])
2. Episodic Memory: Past tasks and solutions
class EpisodicMemory:
"""Remember past tasks"""
def __init__(self):
self.episodes = []
def store_episode(self, task: Task, solution: Dict, outcome: Dict):
"""Store completed task"""
self.episodes.append({
"task": task,
"solution": solution,
"outcome": outcome,
"timestamp": time.time()
})
def recall_similar(self, current_task: Task, limit: int = 5) -> List[Dict]:
"""Recall similar past tasks"""
# Use embedding similarity
return self.episodes[-limit:]
3. Semantic Memory: Codebase knowledge
import chromadb
class SemanticMemory:
"""Long-term codebase knowledge"""
def __init__(self):
self.client = chromadb.Client()
self.collection = self.client.create_collection("codebase")
def index_codebase(self, files: List[str]):
"""Index codebase for semantic search"""
for file_path in files:
with open(file_path, 'r') as f:
code = f.read()
self.collection.add(
documents=[code],
metadatas=[{"file_path": file_path}],
ids=[file_path]
)
def search(self, query: str, n_results: int = 5) -> List[Dict]:
"""Search for relevant code"""
results = self.collection.query(
query_texts=[query],
n_results=n_results
)
return results
Error Handling Strategy
Graceful Degradation
class RobustAgent:
"""Agent with comprehensive error handling"""
def execute_with_fallbacks(self, task: Task) -> Dict:
"""Execute with multiple fallback strategies"""
strategies = [
self.primary_strategy,
self.simplified_strategy,
self.conservative_strategy
]
for strategy in strategies:
try:
result = strategy(task)
if self.validate_result(result):
return result
except Exception as e:
self.log_error(strategy.__name__, e)
continue
return {
"success": False,
"error": "All strategies failed",
"recommendation": "Manual intervention required"
}
Design Decisions
Key Choices
1. Multi-Agent vs Single Agent
- Choice: Multi-agent with specialized roles
- Rationale: Better separation of concerns, easier to test, more maintainable
2. Synchronous vs Asynchronous
- Choice: Asynchronous for I/O operations
- Rationale: Better performance, can analyze multiple files in parallel
3. Local vs Cloud Execution
- Choice: Hybrid (local analysis, cloud LLM)
- Rationale: Security for code, power for reasoning
4. Automatic vs Interactive
- Choice: Interactive with automatic mode option
- Rationale: Safety for critical changes, speed for routine tasks
5. Learning Strategy
- Choice: Few-shot + feedback learning
- Rationale: Fast adaptation without full retraining
✅ Key Takeaways
- Design requires balancing functional and non-functional requirements
- Multi-agent architecture provides separation of concerns
- Safety mechanisms are critical for code-modifying agents
- Memory systems enable learning from past experiences
- Tool selection impacts capabilities and complexity
- Architecture decisions should align with use case constraints
Next Steps
Now that we have the design, let’s implement the Autonomous Software Engineering Agent!
In the next section, you’ll build:
- Complete working implementation
- All specialized agents
- Tool integrations
- Safety mechanisms
- Real-world examples
Implementation
Building the Autonomous Software Engineering Agent
Let’s build the complete system step by step.
Project Setup
# Create project structure
mkdir autonomous-se-agent
cd autonomous-se-agent
# Create directories
mkdir -p src/{agents,tools,memory,orchestration}
mkdir -p tests
mkdir -p data/{cache,feedback}
# Install dependencies
pip install openai chromadb gitpython docker pytest pylint black ast-grep-py
Core Implementation
1. Main Orchestrator
# src/orchestration/orchestrator.py
from typing import Dict, List
from dataclasses import dataclass
from enum import Enum
import openai
class TaskType(Enum):
ANALYZE = "analyze"
FIX = "fix"
TEST = "test"
REFACTOR = "refactor"
REVIEW = "review"
@dataclass
class Task:
type: TaskType
target: str
description: str
context: Dict
class SoftwareEngineeringAgent:
"""Main orchestrator for autonomous SE agent"""
def __init__(self):
self.client = openai.OpenAI()
self.analyzer = AnalyzerAgent()
self.fixer = FixerAgent()
self.tester = TesterAgent()
self.memory = AgentMemory()
def process_request(self, request: str, target_path: str) -> Dict:
"""Process user request"""
# Parse intent
intent = self.parse_intent(request)
# Create plan
plan = self.create_plan(intent, target_path)
# Execute plan
results = self.execute_plan(plan)
# Store in memory
self.memory.store_episode(request, plan, results)
return results
def parse_intent(self, request: str) -> Dict:
"""Parse user intent"""
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{
"role": "system",
"content": "Parse user intent. Return JSON with: task_type, target, requirements"
}, {
"role": "user",
"content": request
}],
temperature=0.2
)
import json
return json.loads(response.choices[0].message.content)
def create_plan(self, intent: Dict, target_path: str) -> List[Task]:
"""Create execution plan"""
tasks = []
task_type = TaskType(intent["task_type"])
if task_type == TaskType.FIX:
# Fix requires: analyze -> fix -> test
tasks.append(Task(TaskType.ANALYZE, target_path, "Analyze code", {}))
tasks.append(Task(TaskType.FIX, target_path, intent["requirements"], {}))
tasks.append(Task(TaskType.TEST, target_path, "Validate fix", {}))
elif task_type == TaskType.REFACTOR:
# Refactor requires: analyze -> refactor -> test
tasks.append(Task(TaskType.ANALYZE, target_path, "Analyze code", {}))
tasks.append(Task(TaskType.REFACTOR, target_path, intent["requirements"], {}))
tasks.append(Task(TaskType.TEST, target_path, "Validate refactor", {}))
else:
tasks.append(Task(task_type, target_path, intent["requirements"], {}))
return tasks
def execute_plan(self, plan: List[Task]) -> Dict:
"""Execute task plan"""
results = []
context = {}
for task in plan:
task.context = context
if task.type == TaskType.ANALYZE:
result = self.analyzer.execute(task)
elif task.type == TaskType.FIX:
result = self.fixer.execute(task)
elif task.type == TaskType.TEST:
result = self.tester.execute(task)
else:
result = {"error": "Unknown task type"}
results.append(result)
context.update(result)
return {"tasks": len(plan), "results": results}
2. Analyzer Agent
# src/agents/analyzer.py
import ast
from typing import Dict, List
class AnalyzerAgent:
"""Analyzes code for issues"""
def __init__(self):
self.client = openai.OpenAI()
def execute(self, task: Task) -> Dict:
"""Analyze code file"""
# Read code
with open(task.target, 'r') as f:
code = f.read()
# Parse AST
ast_analysis = self.analyze_ast(code)
# Run static analysis
static_issues = self.run_static_analysis(task.target)
# LLM-based analysis
llm_analysis = self.llm_analyze(code)
return {
"file": task.target,
"ast_analysis": ast_analysis,
"static_issues": static_issues,
"llm_analysis": llm_analysis,
"issues": self.consolidate_issues(static_issues, llm_analysis)
}
def analyze_ast(self, code: str) -> Dict:
"""Analyze code structure"""
try:
tree = ast.parse(code)
functions = [node.name for node in ast.walk(tree)
if isinstance(node, ast.FunctionDef)]
classes = [node.name for node in ast.walk(tree)
if isinstance(node, ast.ClassDef)]
return {
"functions": functions,
"classes": classes,
"lines": len(code.split('\n'))
}
except SyntaxError as e:
return {"error": str(e)}
def run_static_analysis(self, file_path: str) -> List[Dict]:
"""Run pylint"""
import subprocess
result = subprocess.run(
['pylint', file_path, '--output-format=json'],
capture_output=True,
text=True
)
import json
try:
return json.loads(result.stdout)
except:
return []
def llm_analyze(self, code: str) -> Dict:
"""LLM-based code analysis"""
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{
"role": "system",
"content": "You are an expert code reviewer. Analyze code for bugs, security issues, and improvements."
}, {
"role": "user",
"content": f"Analyze this code:\n\n{code}"
}],
temperature=0.3
)
return {"analysis": response.choices[0].message.content}
def consolidate_issues(self, static: List[Dict], llm: Dict) -> List[Dict]:
"""Consolidate all issues"""
issues = []
# Add static analysis issues
for issue in static:
issues.append({
"type": issue.get("type", "unknown"),
"message": issue.get("message", ""),
"line": issue.get("line", 0),
"severity": issue.get("severity", "info"),
"source": "static"
})
return issues
3. Fixer Agent
# src/agents/fixer.py
from typing import Dict
import difflib
class FixerAgent:
"""Generates and applies fixes"""
def __init__(self):
self.client = openai.OpenAI()
self.validator = FixValidator()
def execute(self, task: Task) -> Dict:
"""Generate and apply fix"""
# Read current code
with open(task.target, 'r') as f:
original_code = f.read()
# Get issues from context
issues = task.context.get("issues", [])
# Generate fix
fixed_code = self.generate_fix(original_code, issues, task.description)
# Validate fix
validation = self.validator.validate(original_code, fixed_code)
if not validation["valid"]:
return {
"success": False,
"error": "Validation failed",
"details": validation
}
# Show diff
diff = self.generate_diff(original_code, fixed_code)
return {
"success": True,
"original_code": original_code,
"fixed_code": fixed_code,
"diff": diff,
"validation": validation
}
def generate_fix(self, code: str, issues: List[Dict], description: str) -> str:
"""Generate fixed code"""
issues_text = "\n".join([
f"- Line {i['line']}: {i['message']}"
for i in issues[:5] # Top 5 issues
])
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{
"role": "system",
"content": "You are an expert programmer. Fix code issues while preserving functionality."
}, {
"role": "user",
"content": f"Fix these issues:\n{issues_text}\n\nRequirement: {description}\n\nOriginal code:\n{code}\n\nFixed code:"
}],
temperature=0.2
)
return self.extract_code(response.choices[0].message.content)
def generate_diff(self, original: str, fixed: str) -> str:
"""Generate unified diff"""
diff = difflib.unified_diff(
original.splitlines(keepends=True),
fixed.splitlines(keepends=True),
fromfile='original',
tofile='fixed'
)
return ''.join(diff)
def extract_code(self, text: str) -> str:
"""Extract code from markdown"""
import re
pattern = r'```python\n(.*?)```'
matches = re.findall(pattern, text, re.DOTALL)
return matches[0] if matches else text
class FixValidator:
"""Validate fixes"""
def validate(self, original: str, fixed: str) -> Dict:
"""Multi-level validation"""
return {
"valid": self.check_syntax(fixed) and self.check_safety(fixed),
"syntax_valid": self.check_syntax(fixed),
"safety_passed": self.check_safety(fixed)
}
def check_syntax(self, code: str) -> bool:
"""Check syntax"""
try:
ast.parse(code)
return True
except:
return False
def check_safety(self, code: str) -> bool:
"""Check for unsafe patterns"""
unsafe = ["eval(", "exec(", "__import__", "os.system"]
return not any(pattern in code for pattern in unsafe)
4. Tester Agent
# src/agents/tester.py
from typing import Dict, List
class TesterAgent:
"""Generates and runs tests"""
def __init__(self):
self.client = openai.OpenAI()
def execute(self, task: Task) -> Dict:
"""Generate tests for code"""
# Read code
with open(task.target, 'r') as f:
code = f.read()
# Generate tests
tests = self.generate_tests(code)
# Run tests
results = self.run_tests(tests)
return {
"tests_generated": len(tests),
"tests_passed": sum(1 for r in results if r["passed"]),
"coverage": self.calculate_coverage(code, tests),
"test_code": tests
}
def generate_tests(self, code: str) -> str:
"""Generate test code"""
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{
"role": "system",
"content": "Generate comprehensive pytest tests. Include edge cases, error cases, and normal cases."
}, {
"role": "user",
"content": f"Generate tests for:\n\n{code}"
}],
temperature=0.3
)
return response.choices[0].message.content
def run_tests(self, test_code: str) -> List[Dict]:
"""Run generated tests"""
# Write to temp file
import tempfile
with tempfile.NamedTemporaryFile(mode='w', suffix='.py', delete=False) as f:
f.write(test_code)
test_file = f.name
# Run pytest
import subprocess
result = subprocess.run(
['pytest', test_file, '-v', '--json-report'],
capture_output=True
)
return [{"passed": result.returncode == 0}]
def calculate_coverage(self, code: str, tests: str) -> float:
"""Estimate test coverage"""
# Simplified coverage estimation
return 0.85
5. Memory System
# src/memory/agent_memory.py
import chromadb
from typing import Dict, List
import json
class AgentMemory:
"""Unified memory system"""
def __init__(self):
self.working_memory = []
self.client = chromadb.Client()
self.episodes = self.client.create_collection("episodes")
self.codebase = self.client.create_collection("codebase")
def store_episode(self, request: str, plan: List[Task], results: Dict):
"""Store completed episode"""
episode = {
"request": request,
"plan": [{"type": t.type.value, "target": t.target} for t in plan],
"results": results,
"success": results.get("success", False)
}
self.episodes.add(
documents=[json.dumps(episode)],
metadatas=[{"request": request}],
ids=[f"episode_{len(self.episodes.get()['ids'])}"]
)
def recall_similar_episodes(self, request: str, limit: int = 3) -> List[Dict]:
"""Recall similar past episodes"""
results = self.episodes.query(
query_texts=[request],
n_results=limit
)
return [json.loads(doc) for doc in results['documents'][0]]
def index_file(self, file_path: str, code: str, analysis: Dict):
"""Index file in semantic memory"""
self.codebase.add(
documents=[code],
metadatas=[{
"file_path": file_path,
"functions": json.dumps(analysis.get("functions", [])),
"classes": json.dumps(analysis.get("classes", []))
}],
ids=[file_path]
)
def search_codebase(self, query: str, limit: int = 5) -> List[Dict]:
"""Search codebase semantically"""
results = self.codebase.query(
query_texts=[query],
n_results=limit
)
return results
6. Tool Layer
# src/tools/code_tools.py
import ast
import subprocess
from typing import Dict, List
class CodeTools:
"""Low-level code manipulation tools"""
@staticmethod
def parse_python(code: str) -> Dict:
"""Parse Python code"""
try:
tree = ast.parse(code)
return {
"valid": True,
"functions": [n.name for n in ast.walk(tree) if isinstance(n, ast.FunctionDef)],
"classes": [n.name for n in ast.walk(tree) if isinstance(n, ast.ClassDef)],
"imports": [n.names[0].name for n in ast.walk(tree) if isinstance(n, ast.Import)]
}
except SyntaxError as e:
return {"valid": False, "error": str(e)}
@staticmethod
def run_linter(file_path: str) -> List[Dict]:
"""Run pylint"""
result = subprocess.run(
['pylint', file_path, '--output-format=json'],
capture_output=True,
text=True
)
import json
try:
return json.loads(result.stdout)
except:
return []
@staticmethod
def format_code(code: str) -> str:
"""Format with black"""
result = subprocess.run(
['black', '-'],
input=code,
capture_output=True,
text=True
)
return result.stdout if result.returncode == 0 else code
@staticmethod
def run_tests(test_path: str) -> Dict:
"""Run pytest"""
result = subprocess.run(
['pytest', test_path, '-v'],
capture_output=True,
text=True
)
return {
"passed": result.returncode == 0,
"output": result.stdout
}
class SafeExecutor:
"""Execute code safely in Docker"""
def __init__(self):
import docker
self.client = docker.from_env()
def execute(self, code: str, timeout: int = 30) -> Dict:
"""Execute in isolated container"""
try:
container = self.client.containers.run(
"python:3.11-slim",
command=['python', '-c', code],
detach=True,
mem_limit="256m",
network_disabled=True,
remove=True
)
result = container.wait(timeout=timeout)
logs = container.logs().decode()
return {"success": True, "output": logs, "exit_code": result['StatusCode']}
except Exception as e:
return {"success": False, "error": str(e)}
7. Complete Agent Implementation
# src/agents/fixer.py (complete version)
from typing import Dict, List
import openai
class FixerAgent:
"""Generates and applies fixes"""
def __init__(self):
self.client = openai.OpenAI()
self.tools = CodeTools()
def execute(self, task: Task) -> Dict:
"""Generate fix for issues"""
# Read code
with open(task.target, 'r') as f:
original_code = f.read()
# Get issues from context
issues = task.context.get("issues", [])
# Retrieve similar fixes from memory
similar_fixes = self.recall_similar_fixes(issues)
# Generate fix with context
fixed_code = self.generate_fix(
original_code,
issues,
task.description,
similar_fixes
)
# Validate
if not self.validate_fix(original_code, fixed_code):
return {"success": False, "error": "Validation failed"}
# Generate explanation
explanation = self.explain_fix(original_code, fixed_code, issues)
return {
"success": True,
"original_code": original_code,
"fixed_code": fixed_code,
"explanation": explanation,
"issues_addressed": len(issues)
}
def generate_fix(self,
code: str,
issues: List[Dict],
description: str,
similar_fixes: List[Dict]) -> str:
"""Generate fixed code"""
issues_text = "\n".join([
f"- Line {i['line']}: {i['message']} (severity: {i['severity']})"
for i in issues[:10]
])
context_text = ""
if similar_fixes:
context_text = "\n\nSimilar fixes from history:\n" + "\n".join([
f"- {fix['description']}: {fix['approach']}"
for fix in similar_fixes[:3]
])
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{
"role": "system",
"content": "Fix code issues while preserving functionality. Return only the fixed code."
}, {
"role": "user",
"content": f"Issues:\n{issues_text}\n\nRequirement: {description}{context_text}\n\nCode:\n{code}\n\nFixed code:"
}],
temperature=0.2
)
return self.extract_code(response.choices[0].message.content)
def validate_fix(self, original: str, fixed: str) -> bool:
"""Validate fix"""
# Check syntax
parsed = self.tools.parse_python(fixed)
if not parsed["valid"]:
return False
# Check no unsafe operations
unsafe = ["eval(", "exec(", "os.system"]
if any(op in fixed for op in unsafe):
return False
return True
def explain_fix(self, original: str, fixed: str, issues: List[Dict]) -> str:
"""Explain what was fixed"""
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{
"role": "user",
"content": f"Explain changes:\n\nOriginal:\n{original[:500]}\n\nFixed:\n{fixed[:500]}\n\nIssues addressed: {len(issues)}"
}],
temperature=0.3
)
return response.choices[0].message.content
def recall_similar_fixes(self, issues: List[Dict]) -> List[Dict]:
"""Recall similar fixes from memory"""
# Simplified - would use vector search
return []
def extract_code(self, text: str) -> str:
"""Extract code from response"""
import re
pattern = r'```python\n(.*?)```'
matches = re.findall(pattern, text, re.DOTALL)
return matches[0] if matches else text
8. CLI Interface
# src/cli.py
import click
from orchestration.orchestrator import SoftwareEngineeringAgent
@click.group()
def cli():
"""Autonomous Software Engineering Agent"""
pass
@cli.command()
@click.argument('file_path')
def analyze(file_path):
"""Analyze code file"""
agent = SoftwareEngineeringAgent()
result = agent.process_request(f"Analyze {file_path}", file_path)
click.echo(json.dumps(result, indent=2))
@cli.command()
@click.argument('file_path')
@click.option('--description', '-d', help='Fix description')
def fix(file_path, description):
"""Fix issues in code"""
agent = SoftwareEngineeringAgent()
result = agent.process_request(
f"Fix issues: {description}" if description else "Fix all issues",
file_path
)
if result['results'][-1]['success']:
click.echo("✓ Fix generated successfully")
click.echo("\nDiff:")
click.echo(result['results'][-1]['diff'])
else:
click.echo("✗ Fix failed")
@cli.command()
@click.argument('file_path')
def test(file_path):
"""Generate tests"""
agent = SoftwareEngineeringAgent()
result = agent.process_request(f"Generate tests for {file_path}", file_path)
click.echo(f"Generated {result['results'][0]['tests_generated']} tests")
if __name__ == '__main__':
cli()
Usage Examples
Example 1: Analyze and Fix
# Analyze code
python src/cli.py analyze src/example.py
# Fix issues
python src/cli.py fix src/example.py --description "Fix type errors and add error handling"
# Generate tests
python src/cli.py test src/example.py
Example 2: Programmatic Usage
from orchestration.orchestrator import SoftwareEngineeringAgent
# Initialize agent
agent = SoftwareEngineeringAgent()
# Analyze code
result = agent.process_request(
"Analyze this file for bugs and security issues",
"src/auth.py"
)
print(f"Found {len(result['results'][0]['issues'])} issues")
# Fix critical issues
fix_result = agent.process_request(
"Fix all critical and high severity issues",
"src/auth.py"
)
if fix_result['results'][-1]['success']:
print("Fix applied successfully")
print(fix_result['results'][-1]['explanation'])
Advanced Features
Learning from Feedback
class FeedbackLearner:
"""Learn from user feedback"""
def __init__(self):
self.feedback_db = []
def collect_feedback(self, task: Task, result: Dict, user_rating: int):
"""Collect user feedback"""
self.feedback_db.append({
"task": task,
"result": result,
"rating": user_rating,
"timestamp": time.time()
})
def improve_from_feedback(self):
"""Analyze feedback and improve"""
# Identify patterns in low-rated results
low_rated = [f for f in self.feedback_db if f["rating"] < 3]
# Extract common issues
# Adjust prompts or strategies
# Update tool selection logic
pass
Parallel Processing
import asyncio
from typing import List
class ParallelAnalyzer:
"""Analyze multiple files in parallel"""
async def analyze_files(self, file_paths: List[str]) -> List[Dict]:
"""Analyze files concurrently"""
tasks = [self.analyze_file(path) for path in file_paths]
results = await asyncio.gather(*tasks)
return results
async def analyze_file(self, file_path: str) -> Dict:
"""Analyze single file"""
analyzer = AnalyzerAgent()
task = Task(TaskType.ANALYZE, file_path, "Analyze", {})
return analyzer.execute(task)
# Usage
async def main():
analyzer = ParallelAnalyzer()
results = await analyzer.analyze_files(['file1.py', 'file2.py', 'file3.py'])
print(f"Analyzed {len(results)} files")
asyncio.run(main())
Testing the Agent
Unit Tests
# tests/test_analyzer.py
import pytest
from agents.analyzer import AnalyzerAgent
from orchestration.orchestrator import Task, TaskType
def test_analyzer_detects_issues():
"""Test analyzer finds issues"""
agent = AnalyzerAgent()
# Create test task
task = Task(
type=TaskType.ANALYZE,
target="tests/fixtures/buggy_code.py",
description="Analyze",
context={}
)
result = agent.execute(task)
assert "issues" in result
assert len(result["issues"]) > 0
def test_analyzer_handles_syntax_errors():
"""Test analyzer handles invalid syntax"""
agent = AnalyzerAgent()
# Write invalid code
with open("tests/fixtures/invalid.py", "w") as f:
f.write("def broken(\n")
task = Task(TaskType.ANALYZE, "tests/fixtures/invalid.py", "Analyze", {})
result = agent.execute(task)
assert "error" in result["ast_analysis"]
Integration Tests
# tests/test_integration.py
import pytest
from orchestration.orchestrator import SoftwareEngineeringAgent
def test_end_to_end_fix():
"""Test complete fix workflow"""
agent = SoftwareEngineeringAgent()
# Create buggy code
buggy_code = '''
def divide(a, b):
return a / b
'''
with open("tests/fixtures/buggy.py", "w") as f:
f.write(buggy_code)
# Request fix
result = agent.process_request(
"Fix the division by zero bug",
"tests/fixtures/buggy.py"
)
# Verify fix was generated
assert result["results"][-1]["success"]
assert "if b == 0" in result["results"][-1]["fixed_code"]
Deployment
Docker Container
# Dockerfile
FROM python:3.11-slim
WORKDIR /app
# Install dependencies
COPY requirements.txt .
RUN pip install -r requirements.txt
# Copy source
COPY src/ ./src/
# Expose API
EXPOSE 8000
CMD ["python", "-m", "uvicorn", "src.api:app", "--host", "0.0.0.0"]
API Service
# src/api.py
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
app = FastAPI(title="Autonomous SE Agent API")
class AnalyzeRequest(BaseModel):
file_path: str
options: Dict = {}
class FixRequest(BaseModel):
file_path: str
description: str
@app.post("/analyze")
async def analyze_code(request: AnalyzeRequest):
"""Analyze code endpoint"""
agent = SoftwareEngineeringAgent()
result = agent.process_request(
f"Analyze {request.file_path}",
request.file_path
)
return result
@app.post("/fix")
async def fix_code(request: FixRequest):
"""Fix code endpoint"""
agent = SoftwareEngineeringAgent()
result = agent.process_request(
f"Fix: {request.description}",
request.file_path
)
return result
@app.get("/health")
async def health():
"""Health check"""
return {"status": "healthy"}
Next Steps
You now have a complete implementation! In the next section, we’ll evaluate and iterate on the agent to make it production-ready.
Evaluation & Iteration
Evaluating Your Agent
Now that you’ve built the Autonomous Software Engineering Agent, let’s evaluate its performance and iterate to improve it.
Evaluation Framework
Test Suite Design
# tests/evaluation/test_suite.py
from typing import Dict, List
from dataclasses import dataclass
@dataclass
class TestCase:
name: str
input_code: str
expected_issues: List[str]
expected_fix_pattern: str
difficulty: str # easy, medium, hard
class EvaluationSuite:
"""Comprehensive evaluation suite"""
def __init__(self):
self.test_cases = self.create_test_cases()
self.results = []
def create_test_cases(self) -> List[TestCase]:
"""Create diverse test cases"""
return [
TestCase(
name="Division by zero",
input_code="def divide(a, b): return a / b",
expected_issues=["ZeroDivisionError"],
expected_fix_pattern="if b == 0",
difficulty="easy"
),
TestCase(
name="SQL injection",
input_code='query = f"SELECT * FROM users WHERE id = {user_id}"',
expected_issues=["SQL injection"],
expected_fix_pattern="parameterized",
difficulty="medium"
),
TestCase(
name="Race condition",
input_code="""
counter = 0
def increment():
global counter
temp = counter
counter = temp + 1
""",
expected_issues=["race condition"],
expected_fix_pattern="lock",
difficulty="hard"
)
]
def run_evaluation(self, agent) -> Dict:
"""Run full evaluation"""
results = {
"total": len(self.test_cases),
"passed": 0,
"by_difficulty": {"easy": 0, "medium": 0, "hard": 0}
}
for test_case in self.test_cases:
result = self.evaluate_test_case(agent, test_case)
self.results.append(result)
if result["passed"]:
results["passed"] += 1
results["by_difficulty"][test_case.difficulty] += 1
results["accuracy"] = results["passed"] / results["total"]
return results
def evaluate_test_case(self, agent, test_case: TestCase) -> Dict:
"""Evaluate single test case"""
# Write test code to file
test_file = f"tests/fixtures/{test_case.name.replace(' ', '_')}.py"
with open(test_file, 'w') as f:
f.write(test_case.input_code)
# Run agent
result = agent.process_request(
f"Analyze and fix issues in {test_file}",
test_file
)
# Check if issues detected
issues_found = result["results"][0].get("issues", [])
detected_expected = any(
expected in str(issues_found).lower()
for expected in test_case.expected_issues
)
# Check if fix applied correctly
fixed_code = result["results"][1].get("fixed_code", "")
fix_correct = test_case.expected_fix_pattern.lower() in fixed_code.lower()
return {
"test_case": test_case.name,
"passed": detected_expected and fix_correct,
"issues_detected": detected_expected,
"fix_correct": fix_correct,
"difficulty": test_case.difficulty
}
Performance Benchmarks
# tests/evaluation/benchmarks.py
import time
from typing import Dict
class PerformanceBenchmark:
"""Benchmark agent performance"""
def __init__(self):
self.metrics = {}
def benchmark_analysis_speed(self, agent, file_sizes: List[int]) -> Dict:
"""Benchmark analysis speed"""
results = {}
for size in file_sizes:
# Generate code of specific size
code = self.generate_code(size)
test_file = f"tests/fixtures/size_{size}.py"
with open(test_file, 'w') as f:
f.write(code)
# Time analysis
start = time.time()
agent.process_request(f"Analyze {test_file}", test_file)
duration = time.time() - start
results[size] = {
"duration": duration,
"lines_per_second": size / duration
}
return results
def benchmark_fix_quality(self, agent, test_cases: List[TestCase]) -> Dict:
"""Benchmark fix quality"""
metrics = {
"fixes_attempted": 0,
"fixes_successful": 0,
"fixes_optimal": 0,
"avg_fix_time": []
}
for test_case in test_cases:
start = time.time()
# Generate fix
result = agent.process_request(
f"Fix issues in {test_case.name}",
test_case.name
)
duration = time.time() - start
metrics["avg_fix_time"].append(duration)
metrics["fixes_attempted"] += 1
if result["results"][-1]["success"]:
metrics["fixes_successful"] += 1
# Check if optimal
if self.is_optimal_fix(result["results"][-1]["fixed_code"]):
metrics["fixes_optimal"] += 1
return metrics
def generate_code(self, lines: int) -> str:
"""Generate code of specific size"""
return "\n".join([f"# Line {i}" for i in range(lines)])
def is_optimal_fix(self, code: str) -> bool:
"""Check if fix is optimal"""
# Simplified check
return "try" in code or "if" in code
Real-World Testing
Beta Testing Strategy
class BetaTester:
"""Coordinate beta testing"""
def __init__(self):
self.testers = []
self.feedback = []
def run_beta_test(self, agent, duration_days: int = 7) -> Dict:
"""Run beta test program"""
print(f"Starting {duration_days}-day beta test...")
# Collect usage data
usage_data = self.collect_usage_data(agent, duration_days)
# Collect feedback
feedback = self.collect_feedback()
# Analyze results
analysis = self.analyze_beta_results(usage_data, feedback)
return analysis
def collect_usage_data(self, agent, days: int) -> Dict:
"""Collect usage metrics"""
return {
"total_requests": 0,
"successful_requests": 0,
"avg_response_time": 0,
"most_common_tasks": [],
"error_rate": 0
}
def collect_feedback(self) -> List[Dict]:
"""Collect user feedback"""
return [
{
"user": "tester1",
"rating": 4,
"comments": "Works well for simple bugs",
"issues": ["Slow on large files"]
}
]
def analyze_beta_results(self, usage: Dict, feedback: List[Dict]) -> Dict:
"""Analyze beta test results"""
avg_rating = sum(f["rating"] for f in feedback) / len(feedback)
return {
"usage_stats": usage,
"avg_rating": avg_rating,
"key_issues": self.extract_key_issues(feedback),
"recommendations": self.generate_recommendations(usage, feedback)
}
def extract_key_issues(self, feedback: List[Dict]) -> List[str]:
"""Extract common issues"""
all_issues = []
for f in feedback:
all_issues.extend(f.get("issues", []))
# Count frequency
from collections import Counter
return [issue for issue, count in Counter(all_issues).most_common(5)]
def generate_recommendations(self, usage: Dict, feedback: List[Dict]) -> List[str]:
"""Generate improvement recommendations"""
recommendations = []
if usage["error_rate"] > 0.1:
recommendations.append("Improve error handling")
if usage["avg_response_time"] > 10:
recommendations.append("Optimize performance")
return recommendations
Iteration Process
Continuous Improvement Loop
class ImprovementLoop:
"""Continuous improvement system"""
def __init__(self, agent):
self.agent = agent
self.version = 1
self.performance_history = []
def iterate(self, evaluation_results: Dict) -> Dict:
"""Improve based on evaluation"""
# Identify weaknesses
weaknesses = self.identify_weaknesses(evaluation_results)
# Generate improvements
improvements = self.generate_improvements(weaknesses)
# Apply improvements
self.apply_improvements(improvements)
# Re-evaluate
new_results = self.evaluate()
# Track progress
self.performance_history.append({
"version": self.version,
"results": new_results
})
self.version += 1
return {
"improvements_made": len(improvements),
"performance_change": self.calculate_improvement(evaluation_results, new_results)
}
def identify_weaknesses(self, results: Dict) -> List[str]:
"""Identify areas needing improvement"""
weaknesses = []
if results["accuracy"] < 0.8:
weaknesses.append("low_accuracy")
if results.get("avg_response_time", 0) > 10:
weaknesses.append("slow_performance")
if results.get("error_rate", 0) > 0.05:
weaknesses.append("high_error_rate")
return weaknesses
def generate_improvements(self, weaknesses: List[str]) -> List[Dict]:
"""Generate improvement strategies"""
improvements = []
for weakness in weaknesses:
if weakness == "low_accuracy":
improvements.append({
"area": "prompts",
"action": "Refine analysis prompts with more examples"
})
elif weakness == "slow_performance":
improvements.append({
"area": "caching",
"action": "Add caching for repeated analyses"
})
elif weakness == "high_error_rate":
improvements.append({
"area": "error_handling",
"action": "Add more robust error handling"
})
return improvements
def apply_improvements(self, improvements: List[Dict]):
"""Apply improvements to agent"""
for improvement in improvements:
print(f"Applying: {improvement['action']}")
# Apply improvement
# In practice, would modify agent configuration or code
def evaluate(self) -> Dict:
"""Run evaluation"""
suite = EvaluationSuite()
return suite.run_evaluation(self.agent)
def calculate_improvement(self, old: Dict, new: Dict) -> float:
"""Calculate improvement percentage"""
old_acc = old.get("accuracy", 0)
new_acc = new.get("accuracy", 0)
return ((new_acc - old_acc) / old_acc * 100) if old_acc > 0 else 0
Production Deployment
Deployment Checklist
- All tests passing
- Performance benchmarks met
- Security audit completed
- Documentation updated
- Monitoring configured
- Rollback plan ready
- User training completed
- Feedback system active
Monitoring Setup
# src/monitoring/metrics.py
from prometheus_client import Counter, Histogram, Gauge
import time
# Define metrics
requests_total = Counter('agent_requests_total', 'Total requests', ['task_type'])
request_duration = Histogram('agent_request_duration_seconds', 'Request duration')
active_tasks = Gauge('agent_active_tasks', 'Active tasks')
errors_total = Counter('agent_errors_total', 'Total errors', ['error_type'])
class MonitoredAgent:
"""Agent with monitoring"""
def __init__(self, agent):
self.agent = agent
def process_request(self, request: str, target: str) -> Dict:
"""Process with monitoring"""
active_tasks.inc()
start = time.time()
try:
result = self.agent.process_request(request, target)
# Record metrics
requests_total.labels(task_type=result.get("task_type", "unknown")).inc()
request_duration.observe(time.time() - start)
return result
except Exception as e:
errors_total.labels(error_type=type(e).__name__).inc()
raise
finally:
active_tasks.dec()
Logging Strategy
# src/monitoring/logging_config.py
import logging
import json
class StructuredLogger:
"""Structured logging for agent"""
def __init__(self):
self.logger = logging.getLogger("se_agent")
self.logger.setLevel(logging.INFO)
handler = logging.FileHandler("agent.log")
handler.setFormatter(logging.Formatter('%(message)s'))
self.logger.addHandler(handler)
def log_request(self, request: str, target: str):
"""Log incoming request"""
self.logger.info(json.dumps({
"event": "request",
"request": request,
"target": target,
"timestamp": time.time()
}))
def log_result(self, result: Dict):
"""Log result"""
self.logger.info(json.dumps({
"event": "result",
"success": result.get("success"),
"timestamp": time.time()
}))
def log_error(self, error: Exception):
"""Log error"""
self.logger.error(json.dumps({
"event": "error",
"error_type": type(error).__name__,
"error_message": str(error),
"timestamp": time.time()
}))
User Feedback Collection
Feedback System
# src/feedback/collector.py
from typing import Dict, Optional
import sqlite3
class FeedbackCollector:
"""Collect and analyze user feedback"""
def __init__(self, db_path: str = "data/feedback.db"):
self.db_path = db_path
self.init_db()
def init_db(self):
"""Initialize feedback database"""
conn = sqlite3.connect(self.db_path)
cursor = conn.cursor()
cursor.execute('''
CREATE TABLE IF NOT EXISTS feedback (
id INTEGER PRIMARY KEY,
task_id TEXT,
rating INTEGER,
comments TEXT,
accepted BOOLEAN,
timestamp REAL
)
''')
conn.commit()
conn.close()
def collect(self, task_id: str, rating: int, comments: str, accepted: bool):
"""Store feedback"""
conn = sqlite3.connect(self.db_path)
cursor = conn.cursor()
cursor.execute('''
INSERT INTO feedback (task_id, rating, comments, accepted, timestamp)
VALUES (?, ?, ?, ?, ?)
''', (task_id, rating, comments, accepted, time.time()))
conn.commit()
conn.close()
def analyze_feedback(self) -> Dict:
"""Analyze collected feedback"""
conn = sqlite3.connect(self.db_path)
cursor = conn.cursor()
# Get statistics
cursor.execute('SELECT AVG(rating), COUNT(*) FROM feedback')
avg_rating, total = cursor.fetchone()
cursor.execute('SELECT COUNT(*) FROM feedback WHERE accepted = 1')
accepted = cursor.fetchone()[0]
conn.close()
return {
"avg_rating": avg_rating,
"total_feedback": total,
"acceptance_rate": accepted / total if total > 0 else 0
}
def get_improvement_suggestions(self) -> List[str]:
"""Extract improvement suggestions from feedback"""
conn = sqlite3.connect(self.db_path)
cursor = conn.cursor()
# Get low-rated feedback
cursor.execute('SELECT comments FROM feedback WHERE rating < 3')
low_rated = cursor.fetchall()
conn.close()
# Extract common themes
suggestions = []
for (comment,) in low_rated:
if comment:
suggestions.append(comment)
return suggestions
A/B Testing
Comparing Agent Versions
class ABTester:
"""A/B test different agent versions"""
def __init__(self, agent_a, agent_b):
self.agent_a = agent_a
self.agent_b = agent_b
self.results_a = []
self.results_b = []
def run_ab_test(self, test_cases: List[TestCase]) -> Dict:
"""Run A/B test"""
import random
for test_case in test_cases:
# Randomly assign to A or B
if random.random() < 0.5:
result = self.test_agent(self.agent_a, test_case)
self.results_a.append(result)
else:
result = self.test_agent(self.agent_b, test_case)
self.results_b.append(result)
# Compare results
return self.compare_results()
def test_agent(self, agent, test_case: TestCase) -> Dict:
"""Test single agent"""
start = time.time()
result = agent.process_request(test_case.name, test_case.name)
duration = time.time() - start
return {
"success": result.get("success", False),
"duration": duration
}
def compare_results(self) -> Dict:
"""Compare A vs B"""
a_success = sum(1 for r in self.results_a if r["success"]) / len(self.results_a)
b_success = sum(1 for r in self.results_b if r["success"]) / len(self.results_b)
a_speed = sum(r["duration"] for r in self.results_a) / len(self.results_a)
b_speed = sum(r["duration"] for r in self.results_b) / len(self.results_b)
return {
"agent_a": {"success_rate": a_success, "avg_duration": a_speed},
"agent_b": {"success_rate": b_success, "avg_duration": b_speed},
"winner": "A" if a_success > b_success else "B"
}
Iteration Examples
Iteration 1: Improve Accuracy
Problem: Agent missing 30% of bugs
Analysis:
# Analyze false negatives
false_negatives = [
"Off-by-one errors",
"Null pointer issues",
"Type mismatches"
]
Solution:
# Enhanced analysis prompt
enhanced_prompt = """Analyze code for:
1. Logic errors (off-by-one, boundary conditions)
2. Null/None handling
3. Type safety
4. Resource leaks
5. Concurrency issues
Be thorough and check edge cases."""
# Update analyzer
analyzer.system_prompt = enhanced_prompt
Result: Accuracy improved from 70% → 85%
Iteration 2: Optimize Performance
Problem: Analysis takes 15s per file (target: <5s)
Analysis:
# Profile performance
import cProfile
profiler = cProfile.Profile()
profiler.enable()
agent.process_request("Analyze file.py", "file.py")
profiler.disable()
profiler.print_stats(sort='cumtime')
Solution:
# Add caching
class CachedAnalyzer:
def __init__(self):
self.cache = {}
def analyze(self, file_path: str) -> Dict:
# Check cache
file_hash = self.hash_file(file_path)
if file_hash in self.cache:
return self.cache[file_hash]
# Analyze
result = self.do_analysis(file_path)
# Cache result
self.cache[file_hash] = result
return result
Result: Analysis time reduced to 3s per file
Iteration 3: Reduce False Positives
Problem: 40% of reported issues are false positives
Analysis:
# Analyze false positives
fp_analysis = {
"style_issues_as_bugs": 15,
"context_misunderstanding": 12,
"overly_strict_checks": 8
}
Solution:
# Add confidence scoring
class ConfidenceScorer:
def score_issue(self, issue: Dict) -> float:
"""Score issue confidence"""
score = 0.5 # Base
# Increase for multiple sources
if issue["source"] == "static" and issue.get("llm_confirmed"):
score += 0.3
# Increase for severity
if issue["severity"] == "critical":
score += 0.2
return min(score, 1.0)
# Filter low-confidence issues
filtered_issues = [i for i in issues if scorer.score_issue(i) > 0.6]
Result: False positive rate reduced from 40% → 15%
Production Metrics
Key Metrics to Track
class ProductionMetrics:
"""Track production metrics"""
def __init__(self):
self.metrics = {
"requests_per_day": 0,
"success_rate": 0,
"avg_response_time": 0,
"user_satisfaction": 0,
"bugs_fixed": 0,
"tests_generated": 0,
"code_quality_improvement": 0
}
def daily_report(self) -> Dict:
"""Generate daily metrics report"""
return {
"date": time.strftime("%Y-%m-%d"),
"metrics": self.metrics,
"alerts": self.check_alerts()
}
def check_alerts(self) -> List[str]:
"""Check for metric alerts"""
alerts = []
if self.metrics["success_rate"] < 0.9:
alerts.append("Success rate below threshold")
if self.metrics["avg_response_time"] > 10:
alerts.append("Response time above threshold")
return alerts
Final Evaluation
Comprehensive Assessment
def final_evaluation(agent) -> Dict:
"""Comprehensive final evaluation"""
# Run test suite
suite = EvaluationSuite()
test_results = suite.run_evaluation(agent)
# Run benchmarks
benchmark = PerformanceBenchmark()
perf_results = benchmark.benchmark_analysis_speed(agent, [100, 500, 1000])
# Analyze feedback
feedback = FeedbackCollector()
feedback_analysis = feedback.analyze_feedback()
# Generate report
report = {
"test_results": test_results,
"performance": perf_results,
"user_feedback": feedback_analysis,
"overall_score": calculate_overall_score(test_results, perf_results, feedback_analysis)
}
return report
def calculate_overall_score(tests: Dict, perf: Dict, feedback: Dict) -> float:
"""Calculate overall score"""
# Weighted average
test_score = tests["accuracy"] * 0.4
perf_score = (1.0 if perf[100]["duration"] < 5 else 0.5) * 0.3
feedback_score = feedback["acceptance_rate"] * 0.3
return test_score + perf_score + feedback_score
Congratulations!
You’ve completed the capstone project! You’ve built a sophisticated Autonomous Software Engineering Agent that:
✅ Analyzes code for bugs and quality issues ✅ Generates fixes with explanations ✅ Writes comprehensive tests ✅ Operates safely with validation ✅ Learns from feedback ✅ Scales to production workloads
Practice Exercises
Exercise 1: Add Code Review Agent (Medium)
Task: Add a ReviewerAgent that analyzes pull requests.
Click to see solution
class ReviewerAgent:
def review_pr(self, diff: str) -> Dict:
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{
"role": "user",
"content": f"Review this code change:\n{diff}\n\nProvide: issues, suggestions, approval"
}]
)
return {"review": response.choices[0].message.content}
Exercise 2: Implement Learning System (Hard)
Task: Make the agent learn from user corrections.
Click to see solution
class LearningAgent:
def __init__(self):
self.corrections = []
def learn_from_correction(self, original: str, corrected: str):
self.corrections.append({"original": original, "corrected": corrected})
# Use corrections as few-shot examples
if len(self.corrections) > 5:
self.update_prompts()
✅ Chapter 10 Summary
You’ve completed the capstone project:
- Designed a multi-agent software engineering system
- Implemented specialized agents (analyzer, fixer, tester)
- Integrated all concepts from previous chapters
- Evaluated with comprehensive test suites
- Deployed with monitoring and feedback loops
This capstone demonstrates how to combine planning, memory, tools, safety, and learning into a production-ready autonomous system.
What You’ve Learned
Throughout this course, you’ve mastered:
- Foundations: Agent architecture and LLM fundamentals
- Building: ReAct patterns and tool integration
- Advanced Patterns: Planning, memory, multi-agent systems
- Tools: Code execution, data access, web interaction
- Production: Reliability, testing, monitoring
- Specialization: Coding, research, automation agents
- Advanced Topics: Learning, multimodal, frameworks
- Enterprise: Architecture, security, cost optimization
- Research: Frontier capabilities, emerging paradigms
- Capstone: Complete production-ready agent
Next Steps
- Deploy your agent: Put it into production
- Contribute: Share your implementation
- Research: Explore open problems
- Build more: Create specialized agents
- Teach: Share your knowledge
Thank you for completing the Agentic Guide to AI Agents course!
Tools & Libraries
Core Libraries
LLM APIs
OpenAI
pip install openai
from openai import OpenAI
client = OpenAI(api_key="your-key")
- Models: GPT-4, GPT-3.5-turbo
- Function calling support
- Streaming responses
- Documentation
Anthropic Claude
pip install anthropic
import anthropic
client = anthropic.Anthropic(api_key="your-key")
- Models: Claude 3 (Opus, Sonnet, Haiku)
- Long context windows (200K tokens)
- Documentation
AWS Bedrock
pip install boto3
import boto3
bedrock = boto3.client('bedrock-runtime', region_name='us-east-1')
- Multiple model providers
- Enterprise features
- Documentation
Agent Frameworks
LangChain
pip install langchain langchain-openai
- Chains, agents, tools
- Memory management
- Documentation
LangGraph
pip install langgraph
- Graph-based workflows
- State management
- Documentation
AutoGPT
git clone https://github.com/Significant-Gravitas/AutoGPT
- Autonomous task execution
- Plugin system
CrewAI
pip install crewai
- Multi-agent orchestration
- Role-based agents
Vector Databases
ChromaDB
pip install chromadb
import chromadb
client = chromadb.Client()
collection = client.create_collection("docs")
- Embedded database
- Simple API
Pinecone
pip install pinecone-client
- Managed service
- High performance
- Scalable
Weaviate
pip install weaviate-client
- Open source
- Hybrid search
- GraphQL API
Code Analysis
AST Tools
pip install ast-grep-py
- Python: Built-in
astmodule - Multi-language: tree-sitter
Linters
pip install pylint ruff mypy
- pylint: Comprehensive checking
- ruff: Fast linting
- mypy: Type checking
Formatters
pip install black isort
- black: Code formatting
- isort: Import sorting
Testing
pytest
pip install pytest pytest-asyncio pytest-cov
- Unit testing
- Async support
- Coverage reports
unittest
- Built-in Python testing
- Standard library
Monitoring
Prometheus
pip install prometheus-client
- Metrics collection
- Time series data
OpenTelemetry
pip install opentelemetry-api opentelemetry-sdk
- Distributed tracing
- Metrics and logs
Utilities
Docker SDK
pip install docker
- Container management
- Safe code execution
GitPython
pip install gitpython
- Git operations
- Repository management
Requests
pip install requests httpx
- HTTP requests
- API integration
Development Tools
IDEs & Editors
- VS Code: Python, Jupyter extensions
- PyCharm: Professional Python IDE
- Cursor: AI-powered editor
- Jupyter: Interactive notebooks
Debugging
- pdb: Python debugger
- ipdb: Enhanced debugger
- pytest-pdb: Test debugging
Documentation
- Sphinx: Python documentation
- MkDocs: Markdown documentation
- mdBook: Rust-based book tool
Deployment Tools
Containerization
- Docker: Container platform
- Docker Compose: Multi-container apps
Orchestration
- Kubernetes: Container orchestration
- AWS ECS: Managed containers
- AWS Lambda: Serverless functions
CI/CD
- GitHub Actions: Automated workflows
- GitLab CI: Integrated CI/CD
- AWS CodePipeline: AWS-native CI/CD
Quick Start Template
# requirements.txt
openai==1.12.0
langchain==0.1.0
chromadb==0.4.22
fastapi==0.109.0
uvicorn==0.27.0
pytest==8.0.0
# agent.py
from openai import OpenAI
class SimpleAgent:
def __init__(self):
self.client = OpenAI()
def run(self, task: str) -> str:
response = self.client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": task}]
)
return response.choices[0].message.content
agent = SimpleAgent()
result = agent.run("Hello!")
print(result)
Resources
Research Papers
Foundational Papers
ReAct: Synergizing Reasoning and Acting in Language Models
- Authors: Yao et al. (2022)
- Paper
- Key contribution: Reasoning + Acting pattern
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
- Authors: Wei et al. (2022)
- Paper
- Key contribution: Step-by-step reasoning
Toolformer: Language Models Can Teach Themselves to Use Tools
- Authors: Schick et al. (2023)
- Paper
- Key contribution: Self-taught tool use
Generative Agents: Interactive Simulacra of Human Behavior
- Authors: Park et al. (2023)
- Paper
- Key contribution: Memory and planning
Recent Advances
GPT-4 Technical Report
- OpenAI (2023)
- Paper
Constitutional AI: Harmlessness from AI Feedback
- Anthropic (2022)
- Paper
Tree of Thoughts: Deliberate Problem Solving with Large Language Models
- Yao et al. (2023)
- Paper
Books
Artificial Intelligence: A Modern Approach
- Authors: Russell & Norvig
- Classic AI textbook
- Agent architectures
Deep Learning
- Authors: Goodfellow, Bengio, Courville
- Neural network foundations
- Free online
Reinforcement Learning: An Introduction
- Authors: Sutton & Barto
- RL fundamentals
- Free online
Online Courses
DeepLearning.AI
- LangChain courses
- AI agent specializations
- Website
Fast.ai
- Practical deep learning
- Free courses
- Website
Stanford CS224N
- NLP with Deep Learning
- Course page
Blogs & Tutorials
Lilian Weng’s Blog
- lilianweng.github.io
- Excellent agent overviews
- Research summaries
Anthropic Research
- anthropic.com/research
- Constitutional AI
- Safety research
OpenAI Blog
- openai.com/blog
- Model releases
- Research updates
Hugging Face Blog
- huggingface.co/blog
- Model tutorials
- Community projects
Communities
Discord Servers
- LangChain Discord
- OpenAI Developer Community
- AI Agent Builders
- r/MachineLearning
- r/LanguageTechnology
- r/artificial
GitHub
- Awesome-LLM repositories
- Agent implementations
- Open source projects
Conferences
NeurIPS - Neural Information Processing Systems
- December annually
- Top ML conference
ICML - International Conference on Machine Learning
- July annually
- Core ML research
ICLR - International Conference on Learning Representations
- May annually
- Deep learning focus
ACL - Association for Computational Linguistics
- July annually
- NLP research
Datasets & Benchmarks
HumanEval
- Code generation benchmark
- GitHub
MMLU - Massive Multitask Language Understanding
- Knowledge benchmark
- 57 subjects
BIG-bench
- Diverse task benchmark
- GitHub
AgentBench
- Agent capability benchmark
- Multi-environment testing
Tools & Platforms
Weights & Biases
- Experiment tracking
- wandb.ai
LangSmith
- LangChain debugging
- Trace visualization
Helicone
- LLM observability
- Cost tracking
PromptLayer
- Prompt management
- Version control
Code Repositories
LangChain
AutoGPT
BabyAGI
AgentGPT
Stay Updated
Newsletters
- The Batch (DeepLearning.AI)
- Import AI
- TLDR AI
Twitter/X Accounts
- @AndrewYNg
- @karpathy
- @ylecun
- @goodfellow_ian
YouTube Channels
- Andrej Karpathy
- Two Minute Papers
- Yannic Kilcher
Practice Platforms
Kaggle
- Competitions
- Datasets
- Notebooks
HuggingFace Spaces
- Deploy demos
- Share models
Replicate
- Run models
- API access
Glossary
A
Agent - An autonomous system that perceives its environment and takes actions to achieve goals.
Agentic Framework - A software framework designed for building AI agents (e.g., LangChain, AutoGPT).
API (Application Programming Interface) - Interface for software components to communicate.
AST (Abstract Syntax Tree) - Tree representation of code structure.
B
Backoff - Strategy for retrying failed operations with increasing delays.
Benchmark - Standardized test for measuring performance.
Beam Search - Search algorithm that explores multiple paths simultaneously.
C
Chain-of-Thought (CoT) - Prompting technique that encourages step-by-step reasoning.
Checkpoint - Saved state of a model or agent for recovery.
Context Window - Maximum amount of text an LLM can process at once.
Constitutional AI - Approach to align AI behavior with principles.
D
Deterministic - Producing the same output given the same input.
Distributed Tracing - Tracking requests across multiple services.
Docker - Platform for containerizing applications.
E
Embedding - Vector representation of text or data.
Episodic Memory - Memory of specific past events or experiences.
Evaluation Metric - Quantitative measure of performance.
F
Few-Shot Learning - Learning from a small number of examples.
Fine-Tuning - Training a pre-trained model on specific data.
Function Calling - LLM capability to invoke external functions.
G
Generalization - Ability to perform well on unseen data.
Guardrails - Safety mechanisms to prevent harmful behavior.
GPU (Graphics Processing Unit) - Hardware for parallel computation.
H
Hallucination - When LLMs generate false or nonsensical information.
Human-in-the-Loop (HITL) - System requiring human approval for decisions.
Hyperparameter - Configuration parameter for model training.
I
Inference - Using a trained model to make predictions.
Interpretability - Ability to understand model decisions.
K
Kubernetes (K8s) - Container orchestration platform.
L
Latency - Time delay between request and response.
LLM (Large Language Model) - Neural network trained on vast text data.
Long-Horizon Planning - Planning over extended time periods.
M
Memory System - Component for storing and retrieving information.
Meta-Learning - Learning how to learn.
Microservices - Architecture pattern with independent services.
Multimodal - Processing multiple types of data (text, images, audio).
N
Neural Network - Computing system inspired by biological brains.
NLP (Natural Language Processing) - Processing and understanding human language.
O
Observability - Ability to understand system internal state from outputs.
Orchestration - Coordinating multiple components or agents.
P
Perception-Reasoning-Action Loop - Core agent cycle: observe, think, act.
Prompt Engineering - Crafting effective prompts for LLMs.
Production - Live environment serving real users.
R
RAG (Retrieval-Augmented Generation) - Combining retrieval with generation.
ReAct - Pattern combining reasoning and acting.
Reinforcement Learning (RL) - Learning through rewards and penalties.
RLHF (Reinforcement Learning from Human Feedback) - Training with human preferences.
S
Sandbox - Isolated environment for safe code execution.
Semantic Memory - Memory of facts and knowledge.
Semantic Search - Search based on meaning, not keywords.
Self-Improvement - Agent’s ability to improve its own capabilities.
Streaming - Sending responses incrementally as generated.
T
Temperature - Parameter controlling randomness in LLM outputs (0=deterministic, 1=creative).
Token - Unit of text processed by LLMs (roughly 0.75 words).
Tool - External function or API an agent can use.
Tree of Thoughts - Exploring multiple reasoning paths.
V
Vector Database - Database optimized for similarity search on embeddings.
Validation - Checking if outputs meet requirements.
W
Working Memory - Short-term memory for current task.
Z
Zero-Shot - Performing tasks without specific training examples.
Common Acronyms
- AI - Artificial Intelligence
- API - Application Programming Interface
- AST - Abstract Syntax Tree
- CI/CD - Continuous Integration/Continuous Deployment
- CoT - Chain-of-Thought
- GPU - Graphics Processing Unit
- HITL - Human-in-the-Loop
- LLM - Large Language Model
- ML - Machine Learning
- NLP - Natural Language Processing
- RAG - Retrieval-Augmented Generation
- RL - Reinforcement Learning
- RLHF - Reinforcement Learning from Human Feedback
- SLA - Service Level Agreement
- ToT - Tree of Thoughts
- UI/UX - User Interface/User Experience
Model Parameters
Temperature - Controls randomness (0.0-2.0)
- 0.0-0.3: Focused, deterministic
- 0.4-0.7: Balanced
- 0.8-1.0: Creative
- 1.0+: Very random
Top-p (Nucleus Sampling) - Alternative to temperature (0.0-1.0)
- 0.1: Very focused
- 0.5: Balanced
- 0.9: Diverse
Max Tokens - Maximum length of response
Frequency Penalty - Reduces repetition (-2.0 to 2.0)
Presence Penalty - Encourages new topics (-2.0 to 2.0)
HTTP Status Codes
- 200 - Success
- 400 - Bad Request
- 401 - Unauthorized
- 429 - Rate Limited
- 500 - Server Error
- 503 - Service Unavailable
Contributing
Thank you for your interest in improving this course! Contributions are welcome and appreciated.
How to Contribute
Reporting Issues
Found an error or have a suggestion?
- Check existing issues
- Create a new issue with:
- Clear description
- Module and section reference
- Expected vs actual behavior
- Suggested fix (if applicable)
Suggesting Improvements
Have ideas for new content or improvements?
- Open a discussion
- Describe your suggestion
- Explain the value it would add
Contributing Content
Want to contribute code, examples, or content?
Process:
- Fork the repository
- Create a feature branch:
git checkout -b feature/your-feature - Make your changes
- Test locally:
mdbook serve - Commit:
git commit -m "Add: your feature" - Push:
git push origin feature/your-feature - Open a Pull Request
Guidelines:
- Follow existing code style
- Include working code examples
- Add comments and explanations
- Test all code before submitting
- Keep examples concise but complete
Content Guidelines
Code Examples
- Use Python 3.9+ syntax
- Include type hints
- Add docstrings
- Handle errors gracefully
- Keep examples under 100 lines when possible
Writing Style
- Clear and concise
- Explain “why” not just “how”
- Use active voice
- Include practical examples
- Link to related sections
Structure
- Start with learning objectives
- Provide context before code
- Explain code after showing it
- End with key takeaways
- Link to next steps
Types of Contributions
High Priority
- Fixing errors or bugs in code
- Improving unclear explanations
- Adding missing error handling
- Updating deprecated APIs
Medium Priority
- Adding practice exercises
- Creating additional examples
- Improving diagrams
- Expanding explanations
Nice to Have
- Translations
- Video tutorials
- Interactive demos
- Community showcases
Code of Conduct
- Be respectful and constructive
- Welcome newcomers
- Focus on improving the content
- Give credit where due
- Assume good intentions
Recognition
Contributors will be acknowledged in:
- GitHub contributors list
- Course acknowledgments section
- Release notes
Questions?
Open a discussion or reach out via GitHub issues.
Thank you for helping make this course better! 🙏