My development workflow now involves three distinct AI agents, each handling a different aspect of my projects. This didn't happen overnight. It evolved over months of figuring out what works and what doesn't. Here's the system.

Agent 1: The Coder (Claude Code)

This is the primary agent. Claude Code handles all direct code work: writing new features, refactoring existing code, debugging issues, writing tests, and performing code reviews. It runs in my terminal with full access to the project directory.

What it does daily:

  • Implements features from specs I write
  • Reads error logs and proposes fixes
  • Generates tests for new and existing code
  • Reviews diffs before I commit
  • Refactors modules when tech debt accumulates

Configuration that matters: Detailed CLAUDE.md with project conventions, MCP servers connected to the dev database and docs, custom slash commands for my most common tasks. The configuration alone took a full day to tune, but it saves hours every week.

Agent 2: The Researcher (Claude in the Browser)

For tasks that need exploration and conversation rather than code execution, I use Claude's web interface. This agent handles architecture planning, technology evaluation, and complex problem decomposition.

What it does weekly:

  • Evaluates technology choices for new features
  • Discusses architectural tradeoffs in long conversations
  • Analyzes competitor implementations I share via screenshots
  • Helps draft technical specs and design documents
  • Processes research papers and documentation into actionable summaries

The key difference: the researcher doesn't write code. It thinks about code. Keeping these roles separate prevents the common failure mode where the AI starts implementing before the design is solid.

Agent 3: The Operator (Custom Automation)

This is a set of automated scripts that use the Claude API for operational tasks. No human in the loop for routine work.

What it does on schedule:

  • Scans for dependency updates weekly, creates PRs for non-breaking updates
  • Reviews incoming error reports and enriches them with code context
  • Generates weekly summary of code changes and their impact
  • Monitors API response times and flags degradation with probable causes

The operator uses Sonnet 4 for cost efficiency. These tasks don't need deep reasoning, they need reliable pattern matching on structured inputs. Sonnet handles them well at a fraction of Opus pricing.

How They Coordinate

The agents don't talk to each other directly. I'm the coordinator. The flow usually looks like this:

  1. The operator flags something (dependency update, error spike, performance issue)
  2. I discuss the issue with the researcher if it needs design thinking
  3. The researcher helps me create a plan or spec
  4. I hand the spec to the coder for implementation
  5. The coder builds, tests, and presents the diff
  6. I review, approve, and deploy

I tried automating this coordination, having one agent hand off to another automatically. It didn't work well. The handoff points are where judgment matters most: deciding whether an alert is worth acting on, deciding if a design is good enough, deciding if a code change is safe to ship. Those decisions need a human.

Cost Management

Running three agents adds up. Here's my typical monthly breakdown:

  • Coder (Claude Code Max): Fixed subscription cost. The most predictable expense.
  • Researcher (Claude Pro): Fixed subscription, included in the plan I already pay for.
  • Operator (API): About $40-60/month, mostly Sonnet calls. Budget-capped at $100.

The operator's API costs were originally much higher. I reduced them by switching from Opus to Sonnet for routine tasks, adding caching for repeated analysis patterns, and implementing strict token budgets per task. A task that loops more than three times gets killed and queued for human review.

What I Tried That Didn't Work

Multi-agent coding. Having one agent plan and another implement sounds elegant. In practice, the implementing agent constantly misunderstands the planner's intent. A single agent with full context produces better results than a pipeline of specialized agents.

Fully autonomous operation. I tried letting the operator apply fixes automatically for known error patterns. It worked 90% of the time and made things worse 10% of the time. That 10% cost more to recover from than the 90% saved. Now the operator proposes fixes and waits for approval on anything beyond restarting a service.

Too many agents. I briefly had five agents with narrow specializations. The coordination overhead exceeded the benefit. Three, with clear boundaries, is the sweet spot for a solo developer. Larger teams might benefit from more, but only if each agent serves a distinct group of users.

The Key Insight

The value of multiple agents isn't in having more AI. It's in having the right AI for the right context. A coding agent needs file access and tool use. A research agent needs conversational depth. An operational agent needs reliability and low cost. Forcing one tool to do all three produces mediocre results across the board.

Find the three or four categories of AI work you do most, match each to the right tool and configuration, and build habits around each one. The system matters more than the individual agent.