Skip to content

Capstone: Ship a Reusable Agent Workbench Pack

The mini-track ends with a pack you drop into any repo. Eleven lessons of surfaces compressed into a directory you can cp -r and have an agent working reliably the next morning. The capstone is the artifact this curriculum trades on.

Type: Build Languages: Python (stdlib) Prerequisites: Phases 14 · 31 to 14 · 41 Time: ~75 minutes

Learning Objectives

  • Package the seven workbench surfaces into one drop-in directory.
  • Pin the schemas, scripts, and templates so a new repo gets a known-good baseline.
  • Add a single installer script that lays down the pack idempotently.
  • Decide what stays in the pack and what stays out, defending the cut for each.

The Problem

A workbench that lives in a Google Doc, a chat history, and three half-remembered scripts is a workbench that gets rebuilt every quarter. The cure is a versioned pack: a repo or directory with the surfaces, the schemas, the scripts, and a one-command installer.

You will end this lesson with outputs/agent-workbench-pack/ shipped on disk and a bin/install.sh that drops it into any target repo.

The Concept

mermaid
flowchart TD
  Pack[agent-workbench-pack/] --> Docs[AGENTS.md + docs/]
  Pack --> Schemas[schemas/]
  Pack --> Scripts[scripts/]
  Pack --> Bin[bin/install.sh]
  Bin --> Repo[target repo]
  Repo --> Surfaces[all seven workbench surfaces wired]

The pack layout

outputs/agent-workbench-pack/
├── AGENTS.md
├── docs/
│   ├── agent-rules.md
│   ├── reliability-policy.md
│   ├── handoff-protocol.md
│   └── reviewer-rubric.md
├── schemas/
│   ├── agent_state.schema.json
│   ├── task_board.schema.json
│   └── scope_contract.schema.json
├── scripts/
│   ├── init_agent.py
│   ├── run_with_feedback.py
│   ├── verify_agent.py
│   └── generate_handoff.py
├── bin/
│   └── install.sh
└── README.md

What stays in, what stays out

In:

  • Surface schemas. They are the contract.
  • The four scripts above. They are the runtime.
  • The four docs. They are the rules and the rubric.

Out:

  • Project-specific tasks. Tasks belong on the target repo's board, not in the pack.
  • Vendor SDK calls. The pack is framework-agnostic.
  • Onboarding prose. The pack lives next to the team's existing onboarding, not inside it.

The installer

A short bin/install.sh (or bin/install.py):

  1. Refuses to install over an existing pack without --force.
  2. Copies the pack into the target repo.
  3. Wires up CI if a .github/workflows/ exists.
  4. Prints next steps: fill in the board, set acceptance commands, run the init script.

Versioning

The pack carries a VERSION file. Schema bumps and script changes that require migrations bump the major. Doc-only changes bump the patch. The target repo's agent_state.json records which pack version it was initialized against.

Build It

code/main.py assembles the pack into outputs/agent-workbench-pack/ next to the lesson, seeded with the schemas and scripts from the previous lessons in this mini-track and the docs you already wrote.

Run it:

python3 code/main.py

The script copies and pins the surfaces, writes the README, prints the pack tree, and exits zero. Re-running is idempotent.

Production patterns in the wild

A pack is only valuable if it survives forks, updates, and an unfriendly upstream. Four patterns make that work.

VERSION is the contract, not the marketing. Major bumps require a state migration. Minor bumps require a checker re-run. Patch bumps are doc-only. The installer writes .workbench-version into the target repo on every install; lint_pack.py refuses to ship if the target's lock disagrees with the pack's VERSION. This is how npm, Cargo, and pyproject.toml survive 10 years of churn; nothing about agents changes the rules.

Single source for cross-tool distribution. Nx ships one nx ai-setup that lays down AGENTS.md, CLAUDE.md, .cursor/rules/, .github/copilot-instructions.md, and an MCP server from a single config. The pack should do the same; the installer emits the symlinks (ln -s AGENTS.md CLAUDE.md) so a single source of truth fans out to every coding agent. Forking the pack to support one tool over another is a failure mode.

uninstall.sh that refuses on non-trivial state. Uninstalling the pack must not delete the user's agent_state.json, task_board.json, or outputs/. The uninstaller removes the schemas, scripts, docs, and AGENTS.md (with --keep-agents-md opt-out) and refuses to proceed if state files have any uncommitted changes. State belongs to the user; the pack does not own it.

Skill-as-publishable. SkillKit-style distribution. The pack ships as a SkillKit skill: skillkit install agent-workbench-pack lays it down across 32 AI agents from a single source. The pack repo is the source of truth; SkillKit is the distribution channel. Vendor lock-in collapses; the seven surfaces stay the same.

Use It

Three places the pack ships:

  • As a directory you drop into a repo. cp -r outputs/agent-workbench-pack /path/to/repo.
  • As a public template repo. Fork-and-customize, with VERSION controlling drift.
  • As a SkillKit skill. Wired into your agent product so a single command lays it down.

The pack is the recipe. Each install is a serving.

Ship It

outputs/skill-workbench-pack.md generates a project-tuned pack: rules sharpened to the team's history, scope globs matched to the repo, rubric dimensions extended with one domain-specific entry.

Exercises

  1. Decide which optional fifth doc deserves promotion into the canonical pack. Defend the cut.
  2. Rewrite the installer as Python with a --dry-run flag. Compare ergonomics against bash.
  3. Add a bin/uninstall.sh that safely removes the pack and refuses if state files have non-trivial history. What counts as non-trivial?
  4. Add a lint_pack.py that fails when the pack drifts from VERSION. Wire it into CI for the pack's own repo.
  5. Author the migration runbook from a hand-rolled workbench to this pack. What is the order of operations that minimizes downtime?

Key Terms

TermWhat people sayWhat it actually means
Workbench pack"The starter kit"A versioned directory carrying all seven surfaces
Installer"Setup script"bin/install.sh that lays the pack down idempotently
Pack version"VERSION"Major bumps for schema/script changes, patch for doc-only
Drop-in pack"cp -r and go"Pack works without per-repo customization on day one
Forkable template"GitHub template"Public repo that GitHub's "Use this template" can clone from

Further Reading