A structured methodology
for AI coding agents

Adapts the balanced-team methodology and XP engineering practices that shipped software for decades — for a world where the builders are AI agents. Discovery & Framing, self-contained stories, adversarial review, evidence-based acceptance, durable knowledge, and hard workflow enforcement turn raw model capability into disciplined delivery.

> Claude Code | Codex | OpenCode | Pi Click to copy

Available now for Claude Code, Codex, OpenCode, and Pi. Pi supports fully local model routing.

AI agents are powerful but undisciplined

Left unconstrained, AI coding agents exhibit predictable failure modes that compound over time.

๐Ÿงช

Superficial Testing

Skip testing entirely or write tests that verify nothing meaningful. Mocks everywhere, no real integration.

๐Ÿงฉ

Isolated Components

Build components that work alone but never integrate. Vertical slices replaced by horizontal layers.

๐Ÿง 

Lost Context

Lose context across sessions and compaction. Make contradictory decisions after forgetting earlier constraints.

๐Ÿช„

Technical Novelty Bias

Ignore business requirements in favor of technically interesting work. Build what's fun, not what's needed.

๐Ÿ

Premature Completion

Mark work as "done" without proof it actually works. Claim success without running real tests against real services.

๐ŸŽญ

Role Collapse

Let one agent discover, design, implement, and approve its own work. No productive tension, no adversarial review, no trustworthy acceptance.

Paivot solves this by applying proven balanced-team and XP engineering practices through specialized agents with strict role boundaries, self-contained story contracts, adversarial review, durable vault-backed memory, and hard enforcement. On Claude Code, Codex, and OpenCode the shared control plane is pvg; on Pi the orchestrator is native and can run entirely on local models.

A software organization, not a prompt chain

The dispatcher coordinates the full Paivot choreography: Discovery & Framing, optional specialist challenge loops, backlog creation, adversarial review, execution, milestone validation, and retrospective learning. On Claude Code, Codex, and OpenCode the queue selection, story transitions, merge gates, and recovery path are delegated to pvg so the workflow does not depend on prompt memory alone.

You (Human)
Business Owner
Dispatcher + Enforcement
Routes work, enforces sequence, never writes code
Discovery & Framing
BA
Business outcomes
Designer
User experience
Architect
Technical approach
Optional specialist challenge loop
BA Challenger
Stress-tests BUSINESS.md
Designer Challenger
Stress-tests DESIGN.md
Architect Challenger
Stress-tests ARCHITECTURE.md
Backlog creation
Sr. PM
Creates self-contained stories
Backlog adversarial review
Anchor
Finds gaps before execution starts
Execution loop
Developer
Implements one story + records proof
PM-Acceptor
Accepts or rejects with evidence
Repeat until each story is accepted
Milestone validation
Anchor
Validates milestone or finds gaps
Retrospective and memory
Retro
Harvests learnings
nd + Vault
Carry contracts, decisions, and knowledge forward
Traditional Balanced TeamsPaivot
Persistent human teamsEphemeral agents, spawned per task
Pair programmingOrchestrated dispatch with PM review
Implicit shared contextSelf-contained stories with ALL context embedded
Trust-based reviewEvidence-based delivery with recorded proof
Centralized project trackernd story contracts plus vault-backed knowledge
Organic learning through pairingStructured evidence, proof, and retro learnings captured deliberately
Flexible role boundariesStrict enforcement (agents lack judgment to flex)
Prompt-based remindersGuarded or native orchestration, depending on platform

Twelve personas, explicit handoffs

Paivot is larger than a developer and a reviewer. The full system includes the dispatcher, discovery roles, specialist challengers, backlog shaping, adversarial review, execution, and retrospective learning.

Dispatcher Orchestration

Coordinates the entire workflow, routes tasks to the right persona, enforces choreography, and never writes code or backlog content itself.

Business Analyst BLT

Captures business outcomes through iterative questioning. Owns BUSINESS.md. Asks "what does success look like?"

Designer BLT

Captures user needs and DX for all product types: UI, API, CLI, database. Owns DESIGN.md.

Architect BLT

Designs system architecture and defines technical constraints. Owns ARCHITECTURE.md.

BA Challenger Challenge

Adversarially reviews BUSINESS.md for omissions, drift, and ambiguity before the backlog is allowed to form.

Designer Challenger Challenge

Adversarially reviews DESIGN.md so weak UX, API, CLI, or DX assumptions are surfaced before execution.

Architect Challenger Challenge

Adversarially reviews ARCHITECTURE.md to catch feasibility gaps, drift, and hallucinated constraints.

Sr. PM Backlog

Creates the backlog from D&F documents. Embeds ALL context into self-contained stories so agents need nothing else.

Anchor Adversarial

Adversarial reviewer. Challenges the backlog for gaps, missing walking skeletons, non-demoable milestones. Not here to be helpful — here to be thorough.

Developer Execution

Ephemeral. Implements one story, runs tests, records proof of passing in delivery notes. Does NOT close stories.

PM-Acceptor Review

Reviews one delivered story using evidence-based approach. Accepts (closes) or rejects with structured EXPECTED/DELIVERED/GAP/FIX notes.

Retro Learning

Harvests learnings from completed epics. Writes actionable knowledge to the vault so future sessions start smarter instead of repeating mistakes.

Specialist challengers are optional but first-class. In Pi, the highest-leverage roles can use stronger hosted or local reasoning models while narrow coding stories can be delegated to smaller local models without weakening the role contract. intake is an operator-facing entry workflow, not a long-lived peer persona, so it is intentionally shown through commands rather than as an agent card.

"The orchestrator cannot be trusted to improvise the process."

LLMs can be persuaded, distracted, or compacted into forgetting discipline. Paivot treats orchestration as a system concern, not just a prompt. Claude Code, Codex, and OpenCode share a deterministic pvg control plane for queue selection, story transitions, merge gating, and recovery; Pi implements the dispatcher natively inside the runtime.

Enforcement in action

1
Dispatcher requests merge story/ABC-123
2
Guard checks contract Story is delivered, not accepted
3
Mismatch detected BLOCKED: merge requires accepted + closed
4
Dispatcher corrects send to PM-Acceptor, then merge

Deterministic Queue Control

DELIVERED -> PM REJECTED -> DEV READY -> DEV WAIT COMPLETE BLOCKED OTHER

pvg loop next --json decides what happens next, pvg story deliver|accept|reject owns the structural transitions, merges stay blocked until a story is both accepted and closed, and pvg loop recover is the break-glass recovery path after interruption.

Hard Enforcement

Invalid merges, unsafe vault writes, broken branch choreography, and out-of-sequence workflow actions are blocked before they do damage.

Persistent State

Workflow state survives compaction, session restarts, tool failures, and provider changes because it lives in the workflow system, not the model's context window.

Audit Trail

Story contracts record evidence, proof, status transitions, and rejection history so every acceptance or rollback has an explicit reason.

Per-Role Model Routing

Implementations can assign stronger models to backlog and adversarial roles and smaller models to narrow coding tasks without weakening the delivery contract.

Every story follows a strict delivery pipeline

No shortcuts. Verification before review. Evidence before acceptance. Learnings before forgetting.

01
Implement Developer builds + tests
02
Record Proof CI results, coverage, output
03
Deliver Mark delivered, NOT closed
04
Verify Integration tests must pass
05
🔍
PM Review Evidence-based acceptance
06
Accept PM closes or rejects

Testing Philosophy

Mocks in integration tests are an automatic rejection. Only real calls prove functionality, and milestones must be demoable end to end.

TypeMocks?Required
UnitOK80% coverage
IntegrationNeverEvery story
E2ENeverMilestones

Evidence-Based Review

PM-Acceptor reviews what was proved, not what was promised. Every rejection must include four parts:

PartPurpose
ExpectedQuote the AC
DeliveredWhat code does
GapWhere it falls short
FixActionable guidance

Learnings Lifecycle

AI agents do not learn by osmosis. Knowledge has to be captured, stored, and deliberately reintroduced into later work.

StageActor
RecordDeveloper notes
FlagPM labels stories
HarvestRetro agent
IncorporateSr. PM (hard-gated)

nd contracts and vault memory keep the system honest

A rigorous methodology needs durable memory. Paivot uses nd for backlog and delivery contracts, plus vault-backed knowledge for decisions, patterns, and retrospective learnings that survive session loss and model swaps.

Why this foundation?

nd provides a git-native, CLI-first story tracker that agents can actually use. Each story carries status, evidence, proof, dependencies, rejection history, and merge readiness in a durable contract.

vlt provides the persistent knowledge layer: system vault, project vault, and session capture. Decisions, debugging insights, and retro learnings stop being tribal knowledge and become reusable context for the next agent.

  • Git-backed and local-first: survives sessions, compaction, and restarts
  • CLI-first with JSON output for agent parsing
  • Dependencies, labels, evidence, proof, and parent-child relationships
  • Project knowledge survives across tools, providers, and long-running work
  • Works with hosted models, hybrid stacks, and fully local Pi deployments

Delivery Contract

Every story carries proof, not just status

Developers implement exactly one story, append evidence and proof, and mark it delivered. PM-Acceptor accepts or rejects. Branch merges stay blocked until the story is both accepted and closed.

## nd_contract
status: delivered

### evidence
- npm test
- git rev-parse HEAD

### proof
- [x] AC #1: export produces valid JSON

That contract is what lets Paivot keep rigor even when the models, providers, or runtimes change underneath it.

Choose the runtime that fits your stack

Same methodology, different integration surfaces. Claude Code, Codex, and OpenCode all share pvg as the deterministic control plane; Pi is the native implementation and can run entirely on local models through LM Studio or other OpenAI-compatible endpoints.

1

Claude Code

Plugin surface with hooks and strong guardrails, backed by the same pvg control plane used by the other hosted runtimes.

# Prereqs: pvg, vlt, Claude Code
git clone https://github.com/paivot-ai/paivot-graph.git
cd paivot-graph && make install
make seed
2

Codex

Codex-native skills and orchestration prompts, with shared queue control and story transitions delegated to pvg.

# Install globally
git clone https://github.com/paivot-ai/paivot-codex.git
cd paivot-codex && make install-global
make check-prereqs
3

OpenCode

OpenCode commands and agent files adapted to its architecture, still backed by nd, vlt, and the same shared pvg control plane. This is the most portable hosted surface and works well with strong OSS coding models too.

# Bootstrap OpenCode
git clone https://github.com/RamXX/paivot-opencode.git
cd paivot-opencode && make install
make install-project TARGET=/path/to/your-project
4

Pi

Native Paivot runtime with per-role model routing, built-in guardrails, and a benchmark harness for quality, latency, retries, speed, and cost. Can run fully local.

# Native Pi workflow
git clone https://github.com/paivot-ai/paivot-pi.git
cd paivot-pi && cp .env.example .env
pi
/paivot

One methodology, four current implementations

Paivot is platform-aware, not platform-fragile. The role system, story contracts, review rigor, and knowledge model stay consistent while each runtime gets the integration surface it can actually support. Claude Code, Codex, and OpenCode now converge on the same deterministic pvg workflow core.

Claude Code
Available Now

Mature plugin workflow with commands, hook integration, vault seeding, and strong unattended execution support, all backed by pvg.

Codex
Available Now

Codex-native skills with the same rigorous backlog, delivery, and acceptance choreography, using pvg for shared queue selection, transitions, and recovery.

OpenCode
Available Now

OpenCode-adapted dispatcher workflow with nd, vlt, and the same pvg control plane, making it a strong hosted option for top OSS coding models.

Pi
Available Now

Native orchestrator with per-role model routing, benchmark tooling, and the option to run the full methodology entirely on local models.