Chapter 13

Technical engraving of modular trays, tool heads, pins, and an expansion rail on a workbench — Fig. 13A new module on the bench.

13 — Extending the kit

When to read this: When you've used the kit on at least one project and you want features it doesn't ship: a custom command, a project-specific reviewer, a CI integration, a team-wide memory pattern.

What's extensible#

The kit is intentionally simple. Almost everything is markdown files plus a few bash scripts and one TypeScript hook. That makes it easy to add to.

You can extend in five places:

Custom slash commands — drop a markdown file in .claude/commands/.
Custom subagents — drop a markdown file in .claude/agents/.
Custom skills — folder under .claude/skills/<name>/SKILL.md.
Additional hooks — wire more entries into .claude/settings.json hooks.<event> arrays.
CI/CD integration — run the kit's check scripts and tests in your pipeline.

This chapter covers each.

Adding a custom slash command#

A slash command is a single markdown file with frontmatter. Drop it in .claude/commands/<name>.md and it's available as /<name> in the next session.

The frontmatter:

---
description: One-line description shown in the command picker
argument-hint: <example argument format>
allowed-tools: Read, Write, Edit, Glob, Grep, Bash
---
 
# Command title
 
The body of the markdown is the prompt that runs when the command is invoked.
The user's argument (everything after the command name) is available as `$ARGUMENTS`.

Example: a /changelog command that appends an entry to CHANGELOG.md:

---
description: Append a CHANGELOG entry for the current change.
argument-hint: <one-line summary of the change>
allowed-tools: Read, Edit, Bash
---
 
# Changelog: $ARGUMENTS
 
Append a new entry to `CHANGELOG.md` under the current date heading.
 
## Steps
 
1. Read `CHANGELOG.md`. If it doesn't exist, create it with a `# Changelog` H1.
2. Find the most recent date heading. If today's date doesn't have a heading, add one (`## YYYY-MM-DD`).
3. Append the entry as a bullet under the date heading: `- $ARGUMENTS`
4. Show the diff. Confirm with the user before writing.
5. After writing, suggest:
   ```bash
   git add CHANGELOG.md && git commit -m "chore: changelog"

 
Drop that file at `.claude/commands/changelog.md`. Open Claude Code, type `/changelog Added streak feature for P2`, and the command runs.
 
The kit's three shipped commands (`craft-ui`, `scaffold-component`, `forbid`) are good models. Verb-led, task-specific, deliberately narrow.
 
### Tips
 
- **Use `$ARGUMENTS` for the user's input.** Everything after the command name on the prompt line.
- **Keep `allowed-tools` tight.** A command that only reads and edits should not list `Bash`. Tighter tool lists reduce permission prompts and make the command safer.
- **Reject vague inputs explicitly.** The `/forbid` command shows the pattern. If the input is vague ("no ugly stuff"), the command refuses and asks for specifics. Don't silently accept low-quality input.
- **End with a suggested follow-up.** Most commands end with "Now run `git add X && git commit -m '...'`" or similar. The follow-up keeps the workflow continuous.
 
## Adding a custom subagent
 
A subagent is a markdown file in `.claude/agents/<name>.md`. The frontmatter declares its name, description, tools, and model.
 
The kit's reviewers are good templates. Copy `api-reviewer.md` (the most thorough) and adapt the rubric:
 
```markdown
---
name: my-domain-reviewer
description: Read-only reviewer for [your domain] concerns. Use after [specific change] is added or modified to verify [specific concerns]. Invoke with phrases like "have the my-domain-reviewer check this" or "review for [concerns]". Does not edit code.
tools: Read, Grep, Glob, Bash
model: sonnet
---
 
You are a senior [domain] engineer doing read-only review. You do not edit code. You read the diff, anchor against [the relevant doc], and produce a structured report with severities and suggested rewrites.
 
[Continue with: lane statement, workflow, rubric, severity, output format, what-you-never-do.]

What makes a good subagent#

The five reviewers shipped by the kit have the same structural shape, copied here so you can mirror it:

Frontmatter.
- name: kebab-case identifier.
- description: explicit "use after X" / "invoke with phrases like Y" guidance, so the parent agent knows when to invoke.
- tools: read-only ideally — Read, Grep, Glob, Bash.
- model: sonnet for most reviewers; inherit for the agents-memory-updater so it matches the parent's model.
Role + lane statement. "You are X. Your sole concern is Y. Z is someone else's job. Stay in lane."
Workflow. Numbered steps: read this first, identify what changed, apply rubric, output report.
Rubric. Specific checks. Anchor each against a written-down rule. Severity codes consistent with the rest of the kit (🔴 / 🟡 / 🟢).
Output format. Reproducible structure so reports are scannable across reviewers.
What you never do. Guardrails: never edit, never approve 🔴 findings, never use vague feedback, stay in lane.

Examples of useful custom reviewers#

accessibility-reviewer — beyond axe-core, checks for ARIA correctness, color contrast at 3:1 for focus indicators, label/input pairing, screen-reader announcements for dynamic content.
copy-reviewer — checks UX copy against DESIGN.md's voice section. Catches forbidden words, sentence length drift, generic AI-tells like "effortlessly," "seamlessly," "leverage."
auth-policy-reviewer — for projects with complex permission models. Reviews each route against the policy doc. Flags inconsistencies.
billing-reviewer — for projects with subscriptions or payments. Checks every Stripe interaction for idempotency, webhook signature verification, customer-portal vs custom-flow consistency.

Adding a custom skill#

A skill is a folder under .claude/skills/<name>/ with a SKILL.md file inside. Optionally, references in references/ (templates, examples).

The kit's six skills (prd-grill, prd-revise, architecture-md-builder, architecture-review, design-md-builder, continual-learning) are good models.

A skill is invoked by name in the chat — the agent recognizes the skill name and runs the SKILL.md as instructions.

---
name: my-custom-skill
description: One paragraph explaining what the skill does, when to use it, and what triggers it. Triggers on phrases like "trigger phrase 1", "trigger phrase 2".
---
 
# My Custom Skill
 
Your job is to [thing].
 
## Operating principle
 
[Core rules, in priority order.]
 
## Phase 0 — Diagnose
 
[Pre-flight checks.]
 
## Phase 1 through N
 
[Walk-through, one phase at a time.]
 
## Phase N+1 — Write the file (if applicable)
 
[File output instructions.]
 
## What this skill does NOT do
 
[Guardrails.]
 
## Reference files
 
- `references/template.md` — canonical structure to fill in.
- `references/examples.md` — worked examples for inspiration.

When to write a skill instead of a command#

Skills are good for multi-phase interrogations or builders. Anything that needs back-and-forth with the user before producing output. The doc-builder skills are all interrogations.
Commands are good for single-step actions where the user passes an argument and the agent acts on it. The kit's /forbid and /scaffold-component are commands because the user provides the input upfront.

The test: does the workflow fit in one parameter? If yes, write a command. If no (it needs phases of clarification before output), write a skill.

Wiring more hooks#

The kit ships a Stop hook. Claude Code supports several other hook events:

PreToolUse — fires before a tool call.
PostToolUse — fires after a tool call.
UserPromptSubmit — fires when the user submits a prompt.
Stop — fires when the agent finishes a turn.
Notification — fires when the agent sends a notification.

Add hooks via .claude/settings.json:

{
  "hooks": {
    "Stop": [
      {
        "matcher": "",
        "hooks": [
          { "type": "command", "command": "bash scripts/check-tokens.sh" },
          { "type": "command", "command": "bun run .claude/hooks/continual-learning-stop.ts" },
          { "type": "command", "command": "bash scripts/run-prettier.sh" }
        ]
      }
    ],
    "PreToolUse": [
      {
        "matcher": "Edit|Write",
        "hooks": [
          { "type": "command", "command": "bash scripts/protect-locked-files.sh" }
        ]
      }
    ]
  }
}

The matcher field is a regex applied to the tool name (or empty string for "match all"). Hooks are typically bash commands but can also be bun run, node, python, etc.

Examples of useful additional hooks#

PreToolUse matcher Edit|Write — block writes to files in a .locked list. Useful for protecting generated files, design tokens, or migration files from accidental edits.
PostToolUse matcher Bash — log every bash command run by the agent to a file. Useful for audit and replay.
Stop hook running tests — run npm test after every turn. Heavyweight but catches regressions instantly.
UserPromptSubmit hook — inject project-specific context based on prompt content (e.g. if the prompt mentions "migration," inject a reminder to invoke migration-reviewer).

The continual-learning Stop hook (templates/claude-hooks/continual-learning-stop.ts) is a good model for any cadence-based hook. It reads stdin JSON, tracks state in a file, computes whether to fire, and emits {"decision":"block","reason":"..."} to inject follow-up instructions or {} to no-op.

CI/CD integration#

The kit's two checks (scripts/check-tokens.sh and tests/a11y.spec.ts) are designed to run in CI without kit-specific tooling.

Token check in GitHub Actions#

name: token-check
on: [pull_request]
 
jobs:
  tokens:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: bash scripts/check-tokens.sh

name: a11y
on: [pull_request]
 
jobs:
  a11y:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with: { node-version: 20 }
      - run: npm ci
      - run: npx playwright install --with-deps
      - run: npm run build
      - run: npm run start &
      - run: npx wait-on http://localhost:3000
      - run: npx playwright test tests/a11y.spec.ts

Mark both as required checks in branch protection rules. Now token violations and a11y regressions can't merge to main, regardless of who or what wrote the change.

Reviewers in CI?#

The four review subagents (design-reviewer, architecture-reviewer, migration-reviewer, api-reviewer) are designed for interactive use during development, not for CI. They produce qualitative reports and propose diffs. No clear pass/fail for CI to gate on.

You can wrap them in a CI step that runs the reviewer and posts the report as a PR comment. The kit doesn't ship this integration, but a 50-line GitHub Action calling the Anthropic API with the reviewer's prompt + the PR diff would do it.

For most teams, the more useful pattern: developers invoke the reviewers locally during development. CI enforces the mechanical checks (tokens, a11y, tests). Don't try to make CI replace the human review loop.

Team adoption#

If you're using the kit solo, much of this doesn't apply. If you're rolling it out across a team, a few patterns help.

Make the kit a dependency, not a copy#

Don't have each developer maintain their own kit. Set up one canonical kit location: an internal git repo, an S3-mirrored copy, or ~/.local/share/agent-workflow-kit/ synced via dotfiles. When the kit improves, update the canonical location and have developers re-run install-global.sh.

Branch the kit for org-specific opinions#

If your team has org-specific opinions that differ from the kit's defaults — different fonts forbidden, different reviewer rubrics, different stack — fork the kit. Keep the directory structure identical so commands and skills are still discoverable. Change the templates to match your org.

The kit is small (~50 files, mostly markdown). Forking is cheap. Maintaining a fork is cheap if you don't need to merge upstream changes often.

Onboarding new team members#

The kit's GETTING-STARTED.md is the on-project version of new-member onboarding. The team-level version:

Have them install the kit (install-global.sh).
Have them run agent-workflow in an existing project to see the structure.
Have them read this manual's chapters 01, 02, and 09 — that's about 30 minutes.
Have them shadow a senior team member for one full pick → kickoff → ship loop.
Have them ship one spec themselves with the senior in observer mode.

Total onboarding to "able to ship a UI spec independently" is typically half a day. Backend specs add another half-day for the architecture / migration / api reviewers.

Multi-IDE: Cursor + Claude Code together#

Some teams have Cursor users and Claude Code users in the same project. The kit supports both:

The Cursor side gets .cursor/rules/project-memory.mdc (always-apply rule) and the Cursor continual-learning plugin's index.
The Claude Code side gets .claude/commands/, .claude/agents/, .claude/skills/, .claude/hooks/, and its own continual-learning index.

Both sides update the same CLAUDE.md, AGENTS.md, DESIGN.md, docs/PRD.md. Documents are shared. Tooling is parallel.

The biggest friction is the continual-learning loop. Cursor and Claude Code transcripts live in different places, so each client mines its own and both update the same AGENTS.md. The shipped subagent (agents-memory-updater) handles deduplication, but the result is some redundant work. If your team is heavily one-or-the-other, disable the unused side by removing its index file.

Common stumbles#

Symptom	Fix
New custom command isn't recognized	Restart Claude Code. Slash commands are loaded at startup
Custom subagent invokes the parent agent's tools instead of its own	Check the subagent's `tools` frontmatter. If empty, it inherits all parent tools. Tighten as needed
Hook is firing but doing nothing	Check that the command is on PATH and exits cleanly. Add `set -e` to bash hooks so they fail loud on errors
Hook is firing on every prompt and slowing things down	Reduce the matcher scope, or move heavy work to a Stop hook so it only fires at turn boundaries
Reviewer subagent produces vague feedback	Tighten the rubric. The kit's reviewers all cite specific file:line locations and specific rule sections. Copy that pattern
Custom skill is invoked but the agent doesn't follow it	Check the SKILL.md frontmatter description. The agent picks the skill based on description matching the user's intent. If the description is vague, the agent doesn't recognize when to invoke
Forked kit drifts from upstream	Add a CHANGELOG to your fork. Pin upstream as a remote and selectively cherry-pick changes you want
CI fails on the a11y test for "expected" violations from a third-party component	Use axe's `disableRules` option for that specific test. Don't disable globally

Continue#

The remaining chapters are reference material. Chapter 14: Skills reference catalogs every skill, command, and agent the kit ships. Chapter 15: Troubleshooting is for when something breaks.