AI System 2026

Multi-Agent Orchestration

A Python MCP server that delegates tasks to multiple AI agents — Claude, Codex, and Gemini — using collaboration patterns like debate, consensus, parallel, and routing. Built for production daily use.

Python MCP Claude Codex Gemini AsyncIO

A system that makes AI agents collaborate instead of working in isolation. Multiple models — Claude, Codex, and Gemini — run tasks through structured collaboration patterns, producing better output than any single prompt.

The Problem

Single-agent AI workflows hit a ceiling. One model misses things another would catch. Reviews are shallow. Complex tasks get simplified to fit one model’s context. I wanted a system where agents could debate, build on each other’s work, and reach consensus.

How It Works

The system is a Python MCP server with six orchestration patterns:

  • Routing — Auto-selects the best agent for a task based on keyword classification
  • Parallel — Runs multiple agents simultaneously, compares results
  • Sequential — Pipeline: Agent A’s output feeds into Agent B
  • Debate — Agent A proposes, Agent B critiques, Agent C synthesizes
  • Consensus — Multiple agents solve independently, a judge picks the best
  • Hierarchical — Decomposes complex tasks into subtasks, fans them out

Each pattern is designed for a different kind of problem. Code reviews use debate. Architecture decisions use consensus. Bug fixes use routing.

Architecture

The system runs as an MCP server, exposing tools that Claude Code (or any MCP client) can call directly. Agent adapters handle the specifics of each AI provider — subprocess management for Codex and Claude Code, API calls for Gemini and Claude API.

Key design decisions:

  • Async subprocess management with proper cleanup on timeout (no zombie processes)
  • ThrottleManager that detects 429 errors and enforces backoff windows per agent
  • Input validation with size limits to prevent prompt injection amplification
  • Inter-agent output truncation to keep sequential handoffs bounded

What I Learned

Building a system that orchestrates AI agents is an exercise in trust boundaries. Each agent trusts the previous agent’s output implicitly. The attack surface isn’t any single component — it’s the trust graph between all of them.

The zombie subprocess bug was the best example of “works until it doesn’t.” Timed-out processes kept running forever, silently accumulating. The agents found it in seconds because they don’t have the bias of “well, it seems to work fine.”

I use this system daily — for code reviews, architecture decisions, and second opinions on complex problems. The debate pattern alone has caught dozens of issues I would have shipped.

Read More

I wrote a three-part series about building and using this system:

  1. I Let 3 AI Agents Debate My Code — pointing the system at my personal website
  2. I Turned My AI Review System on Itself — the system reviewing its own source code
  3. The Model Isn’t the Agent: The Harness Is — how the system works and what I learned building it