Skip to content

I Turned My AI Review System on Itself — It Found 16 Issues

I pointed my multi-agent orchestration system at its own source code. Three AI agents debated the code that runs them. The results were uncomfortably accurate.

Last week I pointed my multi-agent orchestration system at my website and the results were genuinely useful. So naturally, the next question was: what happens when you point the review system at itself?

The orchestration system is a Python MCP server that delegates tasks to multiple AI agents — Codex, Gemini, Claude — using collaboration patterns like debate, consensus, and parallel execution. It’s about 1,500 lines of code across 15 files. Not huge, but dense with async subprocess management, inter-agent communication, and API integrations.

I gave it the same treatment as last time: three agents, debate pattern, specialized personas.

The Setup

Same structure as the website review:

  • Codex — code quality lens. Race conditions, resource leaks, DRY violations, missing error handling.
  • Gemini — architecture lens. Design patterns, extensibility, configuration management, documentation.
  • Claude — security lens. Prompt injection, input validation, subprocess safety, privilege escalation.

Each agent got the full codebase — all 15 Python files, the config YAML, the MCP server entry point. Instructions: find everything wrong with this system. Be specific. Don’t hold back.

There’s something meta about asking AI agents to critique the code that orchestrates AI agents. They’re reviewing their own execution environment. If they find a bug in the debate pattern, they found it while running inside the debate pattern.

The Convergence

Three issues were flagged independently by all three agents. When that happens, you know it’s real.

1. Zombie subprocesses on timeout.

Both the Codex and Claude Code agent adapters spawn subprocesses to run CLI tools. When a subprocess times out, the code catches asyncio.TimeoutError and returns an error result — but it never actually kills the process. The timed-out subprocess keeps running forever. Every timeout leaks a PID and its file handles.

I’d been running this system for weeks. I probably had a graveyard of zombie npx codex processes sitting in my process table.

2. The hierarchical pattern bypasses rate limiting.

I built a ThrottleManager specifically to prevent API quota exhaustion. It detects 429 errors, records backoff windows, and blocks calls to throttled agents. Every other pattern uses it. The hierarchical pattern — which decomposes tasks into subtasks and fans them out — calls agent.execute() directly, skipping the throttle entirely. A complex task that decomposes into 5 subtasks could fire 5 simultaneous API calls to a rate-limited agent.

The irony: I built the throttle because of the hierarchical pattern. It was the one pattern most likely to trigger rate limits, and it was the one pattern that didn’t use the protection.

3. No exception handling in hierarchical subtasks.

Same pattern, different problem. If any single subtask throws an exception, the entire orchestration crashes. No partial results, no graceful degradation. The other 4 subtasks that completed successfully? Gone.

The Security Findings

This is where it got uncomfortable. Claude found attack vectors I hadn’t considered — and they’re all in code that runs other AI agents.

The --dangerously-skip-permissions flag. My Claude Code agent adapter runs with --dangerously-skip-permissions, which gives the subprocess full filesystem and network access. The task description is passed as a CLI argument. If the task string contains prompt injection — either from a malicious user or from another agent’s output in a multi-step pattern — it gets executed with full privileges. The flag name literally has “dangerously” in it and I still used it without thinking twice.

Unsanitized prompts to --full-auto mode. Same issue with the Codex adapter. Task strings go directly into npx codex exec --full-auto without any sanitization. The subprocess exec approach prevents shell injection, but it doesn’t prevent prompt injection. The AI model on the other end still interprets the full string.

Cross-agent prompt injection. In sequential and debate patterns, Agent A’s output becomes Agent B’s input. There’s no sanitization boundary between them. If Agent A hallucinates something that looks like instructions — or if a task description is crafted to make Agent A produce injection payloads — Agent B executes whatever it receives. It’s prompt injection with an extra step.

Zero input validation on the MCP server. The orchestrate() tool accepts any string of any length. No character limit, no null byte filtering, no sanitization at all. Someone could pass a 10MB task description and the system would dutifully forward it to every agent.

None of these are theoretical. They’re the natural consequence of building a system that runs arbitrary code through AI agents and not thinking about the trust boundaries between those agents.

The Architecture Findings

Gemini found the structural issues:

  • Config loaded once at import time. The YAML configuration is parsed when the module loads. No way to reload it without restarting the server. Change an agent’s timeout? Restart. Adjust a model name? Restart.
  • No structured logging. The entire system uses print() statements. No log levels, no request tracing, no way to correlate a failed subtask with the orchestration run that spawned it. Good luck debugging a consensus pattern with 5 agents when all you have is interleaved print output.
  • Routing keywords are hardcoded. The routing pattern decides which agent to use based on keyword matching: “architect” → Claude, “performance” → Codex, etc. The keywords are hardcoded in a Python dict. Adding a new routing rule means editing source code.
  • Gemini agent swallows errors silently. The Gemini adapter’s catch-all exception handler returns an empty string with success=False but doesn’t log anything. API errors, auth failures, rate limits — all silently eaten. I’d been debugging “empty Gemini responses” without realizing the errors were being caught and discarded.

What I Fixed

I ran three rounds of fixes using the system’s own “sonnet prompts Codex” workflow — I orchestrate a task to Codex, it makes the code changes, I review and commit.

Round 1 — Reliability:

  • Kill subprocesses on timeout (proc.kill() + await proc.wait())
  • Route hierarchical subtasks through the ThrottleManager
  • Wrap each subtask in try/except for graceful degradation
  • Preserve task context (project_dir, constraints) in debate critique rounds

Round 2 — Security hardening:

  • Input validation: 50k char limit on tasks, 200k on context, null byte stripping
  • Output truncation: cap inter-agent handoffs at 100k chars in sequential, debate, and consensus patterns
  • Health check timeout: 30s limit prevents hung agents from blocking the endpoint

Round 3 — Quality:

  • Gemini error logging with exception class names
  • Consensus output truncation before building judge comparisons
  • UTC-aware timestamps for result filenames

What I Learned

The meta-review is the hardest review. When agents review a website, the findings are straightforward — bad contrast ratios, missing security headers, DRY violations. When they review their own orchestration system, the findings are about trust boundaries, privilege escalation, and failure cascading. The system’s flaws are more abstract and more consequential.

Security debt compounds in orchestration. A single web app with a missing rate limit is one vulnerability. An orchestration system that passes unsanitized prompts between agents with --full-auto and --dangerously-skip-permissions is a chain of vulnerabilities. Each agent trusts the previous agent’s output implicitly. The attack surface isn’t any single component — it’s the trust graph between all of them.

The zombie process bug is a perfect example of “works until it doesn’t.” I’d been running this system for weeks without noticing the zombie processes because they don’t cause immediate failures. They just quietly accumulate. The agents found it in seconds because they don’t have the bias of “well, it seems to work fine.”

Self-referential testing is underrated. There’s something clarifying about a system reviewing itself. The agents aren’t just finding bugs — they’re finding the assumptions I baked in when I wrote the code. I assumed subprocesses would clean up after themselves. I assumed agent outputs would be reasonable sizes. I assumed the hierarchical pattern would never hit rate limits. Every “assumption” the agents surfaced was a bug I hadn’t written yet.

Sixteen issues found. Eleven fixed in a single session. The remaining ones — structured logging, config reloading, the deeper prompt injection mitigations — are on the list. Not bad for a system reviewing its own source code.


Next in this series: How the orchestration system actually works · See the project page

Interested in multi-agent code review for your project? Get in touch.